Hitchhiking mapping – functional genomics from the population genetics perspective

Hitchhiking mapping – functional genomics from the population genetics perspective

32 Review TRENDS in Genetics Vol.19 No.1 January 2003 Hitchhiking mapping – functional genomics from the population genetics perspective Christian ...

145KB Sizes 0 Downloads 13 Views

32

Review

TRENDS in Genetics Vol.19 No.1 January 2003

Hitchhiking mapping – functional genomics from the population genetics perspective Christian Schlo¨tterer Institut fu¨r Tierzucht und Genetik, Josef-Baumann Gasse 1, 1210 Vienna, Austria

Several statistical tests based on population genetic theory are used to identify genes that have recently acquired a beneficial mutation. Here, I describe the extension of these tests to a multilocus approach for a genome-wide survey for genes that have been under recent positive selection. As this strategy could potentially identify genes with weak phenotypic effects, it will be very useful in population genetic approaches aimed at understanding adaptation processes in natural populations. Furthermore, this ‘hitchhiking mapping’ could also help in the functional characterization of genomes. Once the genomic sequence of an organism is complete, the next major task is the functional annotation of the genes and noncoding sequences. In addition to the experimental characterization of genes, computational approaches based on conservation of sequence and structure between species are becoming increasingly important. Although these two powerful approaches will effect the majority of functional genome annotation, they could miss an important component of functional information. Rapidly evolving genes [1] or genes that have acquired new functions might not be recognized by sequence comparisons between moderately and highly diverged species. Similarly, experimental characterization of gene function requires the presence of a recognizable phenotypic effect, so standard functional analyses will not be applicable to a substantial number of genes because their inactivation does not result in a measurable phenotype under laboratory conditions [2]. Recent results suggest an alternative approach based on population genetic principles. I will first explain some underlying population genetic principles, then review recent multilocus statistics, and finally explain the application to functional genome annotation. Polymorphism in natural populations Populations of constant size Since the discovery of allozyme polymorphism, it has become well known that natural populations can be highly polymorphic. The AMOUNT OF POLYMORPHISM (see Glossary) observed in natural populations (u) is governed mainly by two parameters: the number of individuals participating in Corresponding author: Christian Schlo¨tterer ([email protected]).

reproduction, that is, the EFFECTIVE POPULATION SIZE (Ne), and the mutation rate m, whereby u ¼ 4Nem (Fig. 1a). Typically Drosophila melanogaster MICROSATELLITES have a u value of ,4, which translates into an EXPECTED HETEROZYGOSITY of .0.66. Interestingly, even for neutrally evolving loci with the same mutation rate, the observed variability differs substantially among loci: computer simulations (1000 microsatellite loci, u ¼ 2) indicate that for a sample size of 50 individuals, heterozygosities could range from zero to 0.8. Although this observation is not intuitive, it can readily be explained by COALESCENT THEORY where all alleles in a given sample can be traced to one mostrecent common ancestor (MRCA). Due to stochastic variation in the sampling of gametes from one generation to the next (genetic drift), for some loci the MRCA is observed only a few generations back in the past, whereas for other loci, the time to the MRCA is substantially longer. As the probability of mutation is proportional with time, loci with an MRCA in the distant past are expected to be more polymorphic than those with more-recent MRCAs. Populations with a complex demographic history The above considerations apply to neutrally evolving populations of constant size, but natural populations Glossary Amount of polymorphism (u): The variability in a population. In a constant population u ¼ 4Nem, where m is the mutation rate. u can be estimated from the observed number of alleles, gene diversity, variance in repeat number and pairwise differences between sequences. Coalescent theory: A modern approach in population genetics to describe variability in a population. The term coalescence is derived from the idea, that going backward in time alleles from a population sample share the same ancestor; that is, they coalesce. The ancestor to which all extant gene copies at a given locus could be traced back for the first time is called the most recent common ancestor (MRCA). Effective population size (Ne): The size of an idealized population in which all individuals have the same reproductive success. Census population sizes often deviate from Ne because of an unequal sex ratio or other reasons for variance of reproductive success among individuals (e.g. presence of dominant males). P Expected heterozygosity (gene diversity, H ): H ¼ 1 2 Xi2 where Xi is the frequency of the ith allele at a given locus. Hitchhiking: Changes in frequencies at neutral variation because of linkage to a selected site. Microsatellites: Short sequence motifs (e.g. GT) that are tandemly repeated. A special mutation process, DNA replication slippage, renders them a highly polymorphic marker. Outcrossing species: Species where breeding occurs between genetically unrelated individuals.

http://tigs.trends.com 0168-9525/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(02)00012-4

Review

33

TRENDS in Genetics Vol.19 No.1 January 2003

Variability

(a)

(b) Chromosomal position TRENDS in Genetics

Fig. 2. Expected allele frequency distortion due to a selective sweep. The distortion in allele frequency is measured by a reduction in variability. The position of the selected site is indicated by an arrow, which is very close to the central microsatellite. Note that the positions of only seven markers are given and the variability reduction between them was obtained by interpolation.

(c)

TRENDS in Genetics

Fig. 1. Partitioning of variability at three different chromosomal locations. For each chromosomal region, segregating variation (e.g. single nucleotide polymorphisms) is indicated by circles, squares and triangles. Allelic states are distinguished by filled and empty symbols. (a) Neutral scenario in a population of constant size. (b) Population bottleneck leading to genome wide reduction in variability. (c) The central genomic region was subjected to a recent selective sweep, resulting in a reduced variability and linkage disequilibrium at this locus only.

often have a complex demographic history. Common demographic scenarios involve bottlenecks (temporary reductions in effective population size), population expansion and migration. The demographic history of a population has a profound effect on its genetic composition. Roughly speaking, bottlenecks reduce the time to the MRCA and thus reduce the average levels of variability (Fig. 1b). However, population expansion reduces the loss of alleles due to genetic drift. The effect of migration is complex, as it depends on many parameters, such as migration rates, migration models and number of subpopulations. Admixture is one special case of migration, in which individuals from two or more diverged populations recently started interbreeding. In addition to increased variability, admixture also generates higher levels of association between alleles of different loci (linkage disequilibrium, LD) than in non-admixed, randomly mating populations. The effects of selection Population genetic theory predicts that beneficial mutations are either lost by genetic drift or they increase http://tigs.trends.com

in frequency until they eventually become fixed in a population (a selective sweep; Fig. 1c). In OUTCROSSING species, recombination decouples the selected site from the remainder of the genome. Nevertheless, neutral variants that are linked to the beneficial mutation are also affected by a selective sweep, a phenomenon that has been called HITCHHIKING [3,4]. Hence, an important difference between demography and selection is that selection is targeted to a specific region of the genome. The fixation of selected alleles in a population has some important consequences for the partitioning of neutral variation, which differ from neutral expectations (Table 1). Population genetic tests for the spread of a beneficial mutation are based on the comparison of the observed distribution of polymorphism with the expectation of what would be present under neutrality (see [5] for a recent review). Although this approach has been very successful for the identification of selective sweeps at several genes, it suffers from two major disadvantages: first, the majority of the available statistical tests assumes a constant population size (violations of this assumption could result in more false positives than expected), and second, an a priori knowledge about the putative candidate genes is required, as the majority of the genes are not expected to be the target of a selective sweep (Fig. 2). The recent progress in high-throughput technology allows a shift from the analysis of single loci to a multilocus approach. The joint analysis of a large number of loci covering the entire genome is expected to distinguish genome-wide demographic effects and historic sampling variation among loci from selection. The key idea for such a screen is that only a small fraction of the genome is subjected to a recent hitchhiking event. Hence, a genomewide survey could be used to identify regions that differ from the remainder of the genome. Regions identified by such a ‘hitchhiking mapping’ approach are expected to contain genes or allelic variants that have recently acquired a beneficial mutation. For a comparison see Box 1.

34

Review

TRENDS in Genetics Vol.19 No.1 January 2003

Box 1. Which marker to use? Allozymes were the first available marker for multilocus tests, but apart from their frequent non-neutral behavior, the number of informative allozyme loci is too limited for a generalized hitchhiking mapping study. By contrast, single nucleotide polymorphisms (SNPs) are highly abundant markers, but variability estimates and allele frequencies can be highly biased, unless details about the demographic past of the surveyed populations and the SNP isolation procedure are known [a,b]. Therefore, their usefulness for hitchhiking mapping is currently restricted to candidate regions with a high SNP density. Microsatellites are significantly less abundant than SNPs, but still occur at high numbers in most species. As microsatellites above a certain repeat number are expected to be polymorphic, they do not suffer from the same ascertainment problem as SNPs [b], making them well suited for genome scans. The high mutation rate of microsatellites makes them an informative marker, which is particularly well suited for the characterization of very recent sweeps (up to hundreds of generations). The drawback of this high mutation rate is that the signature of a selective sweep will be erased relatively quickly [c,d]. Sequencing of short genomic regions provides the highest resolution available at the DNA level, but the scarcity of sequence polymorphism in

Intraspecific multilocus tests based on interpopulation comparisons Lewontin –Krakauer test and derivatives The genetic differentiation of neutrally evolving populations is determined mainly by genetic drift and has been measured traditionally by the ‘F-statistics’ [6]. If one locus is subjected to a selective sweep in one population, the FST value at this locus is expected to be larger than that of the neutrally evolving loci. Assuming that all populations evolved independently, Lewontin and Krakauer proposed a formalized test for the identification of selected loci [7]. However, for hierarchically structured populations, the variance among FST values is significantly higher than originally assumed, potentially resulting in too many false positives [8,9]. Therefore, the original Lewontin–Krakauer test was improved to account for population structure by computer simulations that were either based on an inferred population tree [10] or on a large number of populations connected by migration [11]. The simulated (conditional) distribution of FST values was compared with the observed FST values, and both studies observed loci that significantly deviated from neutral expectations. An alternative approach to avoid the problem of an unknown population history is the comparison of two, rather than multiple populations [12]. Following this idea,

some species (e.g. humans) might not always provide enough information to distinguish between a selective sweep and a neutral scenario. DNA sequence variation is better suited for the identification of selective sweeps that occurred in the more distant past.

References a Kuhner, M.K. et al. (2000) Usefulness of single nucleotide polymorphism data for estimating population parameters. Genetics 156, 439 – 447 b Schlo¨ tterer, C. and Harr, B. (2002) Single nucleotide polymorphisms derived from ancestral populations show no evidence for biased diversity estimates in Drosophila melanogaster. Mol. Ecol. 11, 947 – 950 c Schlo¨ tterer, C. and Wiehe, T. (1999) Microsatellites, a neutral marker to infer selective sweeps. Microsatellites – Evolution and Applications (Goldstein, D., Schlo¨ tterer, C. eds), pp. 238 – 248, Oxford University Press d Wiehe, T. (1998) The effect of selective sweeps on the variance of the allele distribution of a linked multi-allele locus-hitchhiking of microsatellites. Theor. Popul. Biol. 53, 272 – 283

new population-specific parameters of population structure and history were introduced to identify potential targets of selection [13]. Under the assumption that the two populations remained isolated since their split, locusspecific branch lengths can be inferred. A comparison of the branch lengths obtained for several loci allows the identification of loci that differ from the remainder of the genome, and are hence the sites of directional selection [13]. Interestingly, [11] and [13] used the same allozyme data from Drosophila simulans, but identified different loci as being the target of directional selection. As none of the identified loci has been verified independently as a target of selection, the power and sensitivity of the suite of Lewontin–Krakauer tests is currently difficult to evaluate. Reduction in microsatellite variability An alternative approach to infer recent selective sweeps is based on the reduction of variability around a selected site (Table 1). Microsatellites are abundant and highly polymorphic markers that are scattered over the euchromatic part of the genome [14,15]. Although microsatellites themselves are unlikely to be the target of selection, if they are linked to a selected site, the variability of the microsatellite would be also reduced (hitchhiking). Comparison of microsatellite variability across loci and

Table 1. Effects of a selective sweep Effect

Description

Markera

Reduced variability

Because of the replacement of other segregating alleles by the selected one

Allele excess

More alleles than expected by observed levels of gene diversity Immediately after the sweep, a surplus of high frequency derived alleles is generated. With increasing time since the selective sweep, new mutations are generated, which results in a surplus of low frequency alleles Selection increases linkage disequilibrium around the selected site

Sequence polymorphism Allozymes Microsatellites Allozymes Microsatellites Sequence polymorphism

Allele frequency distribution

Linkage disequilibrium a

The described effect can be identified using the listed marker type.

http://tigs.trends.com

Sequence polymorphism SNPs, Microsatellites

Review

TRENDS in Genetics Vol.19 No.1 January 2003

populations is complicated by variation in mutation rate among microsatellite loci and differences in effective population size among populations. Based on the simple stepwise mutation model, my colleagues and I suggested a microsatellite-specific test statistic that accounted for variation in mutation rate among loci and different effective population sizes [16]. Applying this test to a dataset of ten microsatellite loci and seven D. melanogaster populations, variability for some locus – population combinations was less than expected from the variability of the other locus – population combinations. Although the study was not conducted on a ‘genomic’ scale, the important observation was that neutral markers, such as microsatellites, could be an important tool to detect recent selective sweeps. In addition, I recently suggested a more systematic approach for the identification of microsatellite loci associated with a recent selective sweep [17]. The test statistic is based on the comparison of a large number of microsatellite loci in two populations. Calculating the ratio of variability in both populations (RV) provides an estimator that has the same expectation for all neutrally evolving loci. The logarithm of this estimator (lnRV) can be approximated by a normal distribution, and those microsatellite loci that are linked to a recently selected site in one of the two populations are expected to be outside the lnRV distribution of neutral loci. As the shape of the lnRV distribution is determined by a large number of neutrally evolving loci, demographic events, such as admixture and bottlenecks are reflected by the distribution of the lnRV values. Hence, this test statistic is largely independent from a detailed knowledge of the demographic history of the two groups. Analysis of 94 microsatellite loci in African and non-African human populations identified two microsatellite loci at which the level of variability deviated significantly from the remainder of the genome [17]. One of the loci had a reduced variability in the African

2400 3

2500

2600

2700

2 1 lnRV

0 –1 –2

35

populations, and one locus was reduced in non-African populations. Similar to those studies based on Lewontin– Krakauer tests, no independent evidence (e.g. analysis of flanking regions) for the presence of a selective sweep was provided. A recent study in D. melanogaster, however, demonstrated that a genomic region identified by the lnRV test as the target of selection also deviated from neutral expectations for DNA sequence polymorphism data (Fig. 3) [18]. An approach related to the lnRV test statistic was used for a genome-wide scan of microsatellite variation based on 342 polymorphic microsatellites in the malaria-causing Plasmodium falciparum. Using relative homozygosities, Wotton, et al. constructed a likelihood-ratio test based on an empirical distribution of relative homozygosities that was obtained by permutation [19]. The comparison between isolates that are sensitive or resistant to chloroquine identified one genomic region on chromosome 7 that displayed a strong reduction in variability around the pfcrt gene in chloroquine-resistant isolates [19]. As the pfcrt gene had previously been shown to confer resistance to chloroquine, this provided independent proof for the presence of a selective sweep in the region of reduced variability. Intraspecific multilocus tests based on single populations Allele excess at microsatellites An alternative approach to screen for selection compares the number of observed microsatellite alleles with the expectation based on the observed gene diversity [20,21]. No deviation is expected under neutrality, but a recent selective sweep results in a surplus of alleles at a microsatellite locus linked to the selected site. Using this approach, a large survey (. 5000 loci) of human microsatellite variability in Caucasians provided evidence for a large number of putative selective sweeps [22]. Interestingly, a larger number of selective sweeps was inferred on the X chromosome than on the autosomes. A recent genome survey in maize identified several candidate microsatellites that might be linked to genes of agronomic interest [23]. Most of these loci, however, could not be verified by the Lewontin –Krakauer test. One of the problems of the allele-excess test statistic is that it is sensitive to demography and to the model used to describe microsatellite mutation.

–3 –4 –5 –6 Position in kb on X chromosome TRENDS in Genetics

Fig. 3. Analysis of 14 linked microsatellite loci in Drosophila melanogaster. Variability at each locus is indicated by lnRV based on a comparison of African and nonAfrican populations. Loci located in the gray shaded box do not deviate from neutral expectations. Two genomic with reduced variability can be recognized, which have a smaller lnRV value than neutrally evolving loci (sweep region 1 and 2). The non-neutral evolution of these genomic regions was also confirmed by DNA sequence polymorphism analysis ([18], not shown). Note that the shape of the observed variability around the two selected sites closely resembles to shape shown in Fig. 2. This figure was modified from [18]. http://tigs.trends.com

Linkage disequilibrium between microsatellite alleles A strong selective sweep also increases the correlation among alleles at loci adjacent to the selected site, which is reflected in an increase in LD above the neutral expectation. One pioneering study detected strong LD at microsatellite loci adjacent to a gene coding for warfarin resistance in resistant rat populations [24]. Genome scans relying on LD as a mean to detect selective sweeps are challenged by variation of recombination rates along chromosomes and the profound effect of demography on the observed levels of LD. Furthermore, it is currently unclear whether weaker selection also results in a significant increase in LD.

36

Review

TRENDS in Genetics Vol.19 No.1 January 2003

Box 2. Proposed strategy for hitchhiking mapping † Genome scan based on a large number of easy to score markers (e.g. microsatellites) in populations adapted to different environmental conditions. † Multilocus test statistic for the identification of loci, which differ significantly from the remainder of the genome. † High-density scan around the identified candidate regions to further narrow down the position of the selected gene (microsatellites or sequence polymorphism). † Sequencing or SNP typing of the identified region in multiple individuals from different populations.

DNA sequences Probably the first ‘multilocus’ test for DNA sequences was the HKA test [25], which compared the polymorphism at two loci with their divergence in a closely related species. Although the HKA test did not incorporate demography, a recent coalescence-based maximum-likelihood version of the test distinguishes between three hypotheses: (1) stable population evolving neutrally, (2) population bottleneck, and (3) selective sweep at one or more loci [26]. To recognize a selective sweep, heterogeneity in the time and strength of diversity-reducing events are estimated across loci. Based on a dataset of three loci in D. melanogaster, Galtier et al. found evidence for one strong selective sweep at the locus Vha and a weak, very recent sweep at locus Su(H). Until now, this method has not been applied to a larger dataset, and it remains unclear whether the observed high proportion of selected genes is an artifact of the method or not. A completely different approach to infer a selective sweep takes advantage of the well-described correlation between natural variability and recombination [27]. Eleven genomic regions were surveyed along the X chromosome of D. melanogaster. As the recombination rate for those loci decreased along the chromosome, Nurminsky et al. expected a monotonic decrease in variability if no recent selective sweep occurred. What they observed was a depression in variability around the gene Sdic, suggesting that this gene has recently experienced a selective sweep [27]. The reliability of this test statistic will become clear when more loci are analyzed, and the monotonic decline in variability can be confirmed. A refined method, which also exploits the non-independence of linked sites was recently introduced [28]. As a recent selective sweep reduces variability around the selected site, the presence of a selective sweep, its position, and the associated selective advantage can be estimated if sequence polymorphism has been determined for multiple regions around the selected site [28]. The requirement of intensive computer simulations to formulate the neutral null hypothesis renders this test statistic better suited for the analysis of a single genomic region rather than a genome-wide survey. Interspecific tests The ratio of synonymous to nonsynonymous substitutions is probably the most-reliable, sequence-based test statistic, because it is not affected by demography [29,30]. Nevertheless, it requires that multiple beneficial mutations occurred in the protein-coding region. This method has http://tigs.trends.com

been used successfully, particularly when the genes involved were evolutionarily selected for divergence (e.g. HIV-1, reproductive proteins). However, adaptive changes in regulatory regions and noncoding RNAs will not be detected with this approach. Hence, it remains open to further investigation, whether a genome-wide screen based on differences between synonymous and non-synonymous changes will be biased towards genes evolutionarily selected for divergence and/or new functions. Future directions Probably the greatest difficulty of any multilocus study is an accurate determination of a significance level (a-value). When a large number of tests is performed (e.g. 100) with a given significance level (e.g. 0.05), then by definition some ‘significant’ loci will be identified (five in our example). Hence, the a-value needs to be adjusted for multiple tests. As standard methods to account for multiple testing are conservative, loci subjected to selection could be missed. Although an ad hoc definition of an ‘adequate’ significance level is possible, this approach is not satisfactory. An alternative approach would be to accept the possibility of false positives in a first-pass multilocus screen, but to use a more refined analysis around an identified candidate region. Variability of multiple microsatellites or DNA sequences could be determined around the locus identified in the first round of screening. As linked genomic regions are not evolving independently, the signature of a recent selective sweep can be recognized in the entire region flanking the selected site. On average, the allele frequency distortion will form a gradient from a strong skew close to the selected site to no skew at distant sites (Fig. 1). The joint analysis of multiple markers covering a candidate region is more informative than just a single sequence stretch or microsatellite. The observed data can be compared with either a simple neutral scenario [28] or other more complex scenarios including demography. For a proposed hitchhiking strategy see Box 2. Based on the available data, it is possible to predict that first-pass genome-wide scans will provide interesting candidate regions. The next important step is the experimental exploration of the partitioning of variation in such candidate regions. Results of such studies will provide insight about the possible resolution of the hitchhiking mapping approach. Preliminary results in D. melanogaster suggest that the mapping strategy can be quite accurate, encompassing only a few kb [18]. Nevertheless, more data are required to obtain reliable estimates about the site hitchhiking mapping intervals. Future work on hitchhiking mapping also needs to incorporate potential interactions of multiple, linked beneficial mutations arising on different chromosomes, a phenomenon called ‘trafficking’ in the case of outcrossing organisms [31] or ‘clonal interference’ in the case of clonal species [32]. Furthermore, as balancing selection involving multiple loci has a profound effect on levels of variability even for intervening sequences [33,34], this phenomenon should be considered. Finally, the pattern of variation at selected loci needs to be studied in a more realistic metapopulation framework [35], which would also include gene flow between sub-divided (and potentially differentially

Review

37

TRENDS in Genetics Vol.19 No.1 January 2003

Table 2. Comparison of hitchhiking mapping with linkage disequilibrium (LD) and quantitative trait locus (QTL) mapping Visible phenotype Segregating variation Experimental crosses Mapping interval

Hitchhiking mapping

LD mapping

QTL mapping

Not required Not required Not required Not yet determined

Required Required Not required Few kb to 100 kb

Required Required Required 3–10 cM

adapted) populations as well as fluctuation sub-population size.

of C.S. is supported by grants from the FWF and an EMBO young investigator program award.

Elucidating genome function by hitchhiking mapping Hitchhiking mapping provides functional information on different levels. The first and least informative level is the identification of selective sweeps. Those genes (or noncoding RNAs) that cause a selective sweep, must have a function even if laboratory experiments fail to demonstrate this. Their identification by hitchhiking mapping could therefore contribute to a more-complete functional characterization of the genome. More information about the swept gene can be obtained if populations are compared, which have been exposed to a well-known selection regime. Resistance to warfarin or chloroquine are two examples in which hitchhiking mapping has already verified known resistance genes [19,24]. Similar comparisons could be performed for hitherto uncharacterized, commercially important traits, such as milk yield in cattle or fat content in pigs. The most ambitious goal of hitchhiking mapping is the identification of the base substitution (quantitative trait nucleotide, QTN) that confers the selective advantage. Hence, the availability of two or more alleles with a known functional difference can provide a better functional understanding of the corresponding genes.

References

Comparison with other multilocus mapping approaches Although hitchhiking mapping is a very recent approach that has yet to be explored fully, two other multilocus mapping techniques, LD mapping [36] and quantitative trait locus (QTL) mapping [37], also take advantage of natural variation to map genes. The most important difference between hitchhiking mapping and the other multilocus screens is that the later methods test for an association with a measurable phenotype (Table 2). Hitchhiking mapping, however, uses the genomic signature left by a selective sweep. Hence, mutations that are of ecological importance, but cannot be measured under laboratory conditions, could be identified by the hitchhiking mapping approach. A further difference is that mapping of LD and QTL requires segregating variation. The most informative situation for hitchhiking mapping is when all variability has been lost owing to the selective sweep. This could be exploited for mapping traits in heavily selected populations, such as cattle selected for milk production traits. Acknowledgements Many thanks to K. Dawson, B. Harr, M.-T. Hauser, M. Kauer and T. Wiehe for helpful comments. I am grateful to N. Barton for sharing unpublished results. The comments of four anonymous reviewers made the manuscript more accessible to a wider audience. The laboratory http://tigs.trends.com

1 Schmid, K.J. and Tautz, D. (1997) A screen for fast evolving genes from Drosophila. Proc. Natl Acad. Sci. USA 94, 9746– 9750 2 Smith, V. et al. (1996) Functional analysis of the genes of yeast chromosome V by genetic footprinting. Science 274, 2069 – 2074 3 Maynard Smith, J. and Haigh, J. (1974) The hitch-hiking effect of a favorable gene. Genet. Res. 23, 23 – 35 4 Barton, N.H. (2000) Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B Biol. Sci. 355, 1553 – 1562 5 Otto, S.P. (2000) Detecting the form of selection from DNA sequence data. Trends Genet. 16, 526 – 529 6 Wright, S. (1921) Systems of mating, I – IV. Genetics 6, 111 – 178 7 Lewontin, R.C. and Krakauer, J. (1973) Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74, 175 – 195 8 Robertson, A. (1975) Remarks on the Lewontin – Krakauer test. Genetics 80, 396 9 Nei, M. and Maruyama, T. (1975) Letters to the editors: Lewontin – Krakauer test for neutral genes. Genetics 80, 395 10 Bowcock, A.M. et al. (1991) Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. Proc. Natl Acad. Sci. USA 88, 839 – 843 11 Beaumont, M.A. and Nichols, R.A. (1996) Evaluating loci for use in genetic analysis of population structure. Proc. R. Soc. London Ser. B 263, 1619– 1626 12 Tsakas, S. and Krimbas, C.B. (1976) Testing the heterogeneity of F values: a suggestion and a correction. Genetics 84, 399 – 401 13 Vitalis, R. et al. (2001) Interpretation of variation across marker loci as evidence of selection. Genetics 158, 1811 – 1823 14 Ellegren, H. (2000) Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet. 16, 551 – 558 15 Schlo¨ tterer, C. (2000) Evolutionary dynamics of microsatellite DNA. Chromosoma 109, 365– 371 16 Schlo¨ tterer, C. et al. (1997) Polymorphism and locus-specific effects on polymorphism at microsatellite loci in natural Drosophila melanogaster populations. Genetics 146, 309– 320 17 Schlo¨ tterer, C. (2002) A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics 160, 753– 763 18 Harr, B. et al. (2002) Hitchhiking mapping – a population based fine mapping strategy for adaptive mutations in D. melanogaster. Proc. Natl Acad. Sci. USA 99, 12949 – 12954 19 Wootton, J.C. et al. (2002) Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature 418, 320 – 323 20 Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87 – 112 21 Kimura, M. and Ohta, T. (1975) Distribution of allelic frequencies in a finite population under stepwise production of neutral alleles. Proc. Natl Acad. Sci. USA 72, 2761– 2764 22 Payseur, B.A. et al. (2002) Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. Mol. Biol. Evol. 19, 1143 – 1153 23 Vigouroux, Y. et al. (2002) Identifying genes of agronomic importance in maize by screening microsatellites for evidence of selection during domestication. Proc. Natl Acad. Sci. USA 99, 9650 – 9655 24 Kohn, M.H. et al. (2000) Natural selection mapping of the warfarinresistance gene. Proc. Natl Acad. Sci. USA 97, 7911 – 7915 25 Hudson, R.R. et al. (1987) A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153– 159 26 Galtier, N. et al. (2000) Detecting bottlenecks and selective sweeps from DNA sequence polymorphism. Genetics 155, 981– 987

Review

38

TRENDS in Genetics Vol.19 No.1 January 2003

27 Nurminsky, D. et al. (1998) Selective sweep of a newly evolved spermspecific gene in Drosophila. Nature 396, 572 – 575 28 Kim, Y. and Stephan, W. (2002) Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160, 765– 777 29 Yang, Z. and Bielawski, J.P. (2000) Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496– 503 30 Nielsen, R. (2001) Statistical tests of selective neutrality in the age of genomics. Heredity 86, 641– 647 31 Kirby, D.A. and Stephan, W. (1996) Multilocus selection and the structure of variation at the white gene of Drosophila melanogaster. Genetics 144, 635 – 645

32 Gerrish, P.J. and Lenski, R.E. (1998) The fate of competing beneficial mutations in an asexual population. Genetica 102, 127– 144 33 Kelly, J.K. and Wade, M.J. (2000) Molecular evolution near a two-locus balanced polymorphism. J. Theor. Biol. 204, 83 – 101 34 Barton, N.H. and Navarro, A. (2002) Extending the coalescent to multilocus systems: the case of balancing selection. Genet. Res. in press 35 Hanski, I. (1999) Metapopulation Ecology, Oxford University Press 36 Weiss, K.M. and Clark, A.G. (2002) Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 18, 19 – 24 37 Lynch, M. and Walsh, B. (1998) Genetics and Analysis of Quantitative Traits, Sinauer Associates

Articles of interest in other Trends and Current Opinion journals Multisite phosphorylation provides sophisticated regulation of transcription factors Carina I. Holmberg, Stefanie E.F. Tran, John E. Eriksson and Lea Sistonen Trends in Biochemical Sciences 10.1016/S0968-0004(02)02207-7 Plants and human health in the twenty-first century Ilya Raskin, David M. Ribnicky, Slavko Komarnytsky, Nebojsa Ilic, Alexander Poulev, Nikolai Borisjuk, Anita Brinker, Diego A. Moreno, Christophe Ripoll, Nir Yakoby, Joseph M. O’Neal, Teresa Cornwell, Ira Pastor and Bertold Fridlender (December, 2002) Trends in Biotechnology 20, 522–531 A model for PKC involvement in the pathogenesis of inborn errors of metabolism Avihu Boneh (November 2002) Trends in Molecular Medicine 8, 524–531 The dance of the clams: twists and turns in the family C GPCR homodimer Anders A. Jensen, Jeremy R. Greenwood and Hans Bra¨ uner-Osborne (November 2002) Trends in Pharmacological Sciences 23, 49–493 Estrogen and cognitive aging in women Barbara B. Sherwin (November 2002) Trends in Pharmacological Sciences 23, 527–534 Receptor classification: post genome Steven M. Foord (October 2002) Current Opinion in Pharmacology 2, 561–566 Latest developments in crystallography and structure-based design of protein kinase inhibitors as drug candidates David H. Williams and Tim Mitchell (October 2002) Current Opinion in Pharmacology 2, 567–573 http://tigs.trends.com