Introduction to Human Genetics

Introduction to Human Genetics

CHAPTER 1 Introduction to Human Genetics Jennifer E. Posey1 and Katherina Walz2,3 1 Department of Molecular and Human Genetics, Baylor College of Me...

528KB Sizes 0 Downloads 12 Views

CHAPTER 1

Introduction to Human Genetics Jennifer E. Posey1 and Katherina Walz2,3 1

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States, 2Department of Human Genetics, Miller School Medicine, University of Miami, Miami, FL, United States, 3John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, United States

1.1

INTRODUCTION

The last three decades have seen tremendous growth in the ability to link genes, and individual variants, to human disease traits. Genome-wide assays such as chromosomal microarray and exome and genome sequencing have stimulated this growth. Model organism studies are instrumental for understanding the molecular mechanisms underlying phenotypic expression associated with a particular locus, and enable a true dissection of both functional variant types and combinations of variants (biallelic or multilocus) on human disease traits. Below we review the current status of discovery in human genetics and genomics, and describe several examples of discoveries for which model organism study will provide a cornerstone for elucidating relationships between genomic variation, molecular pathogenesis of disease, and disease biology, with the potential to identify targets for therapeutic development and enable precision medicine.

1.2

CURRENT STATUS OF DISCOVERY

1.2.1 From 0 to 60: Linking Human Genetic and Genomic Variation to Human Disease In many ways the discovery of the human chromosome number in 19561 formed the cornerstone on which our present understanding of the relationship between individual genomes and their impact on the expression of disease would be built. Karyotype technology led to the elucidation of common chromosomal aneuploidies underlying previously clinically characterized conditions such as Down syndrome, first described in 18662 but not Cellular and Animal Models in Human Genomics Research. DOI: https://doi.org/10.1016/B978-0-12-816573-7.00001-8 © 2019 Elsevier Inc. All rights reserved.

1

2

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

characterized cytogenetically until 1959.3 The first disease gene would be mapped in 1983,4 eventually linking trinucleotide repeat expansions in huntingtin (HTT) to the expression of Huntington disease.5 From that time, both single-gene disorders and genomic disorders6 8 resulting from submicroscopic structural variation of the human genome were increasingly described. Linkage analysis and genome-wide association studies (GWAS) were used to identify common polymorphisms associated with a disease trait of interest, but they often fell short of directly identifying the disease gene and etiologic rare variant, necessitating additional techniques such as positional cloning for disease gene discovery. The development and implementation of tools that enable a direct genome-wide interrogation for rare structural and single nucleotide variants would revolutionize human disease gene discovery. In this regard, chromosomal microarray (CMA), exome sequencing (ES), and genome sequencing (GS) have truly accelerated gene discovery. Concurrent with technology development has been the elaboration of numerous variant annotation resources, including those that provide minor allele frequency data for populations of varying ethnicities, such as the Exome Aggregation Consortium (ExAC),9 the Genome Aggregation Database (gnomAD), the 1000 Genomes Project,10 the National Heart Lung and Blood Institute Exome Sequencing Project (http://evs.gs.washington.edu/EVS/), and the Atherosclerosis Risk in Communities11 databases, as well as catalogs of human structural variation such as the Database of Genomic Variants (DGV) and the Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources (DECIPHER). Measures of evolutionary conservation [GERP (Genomic Evolutionary Rate Profiling), phyloP] and estimates of protein functional impact [SIFT(Sorting Intolerant From Tolerant),12 PolyPhen2 (Polymorphism Phenotyping v2),13 MutationTaster,14 LRT (Likelihood Ratio Test),15 CADD (Combined Annotation Dependent Depletion),16 REVEL (Rare Exome Variant Ensemble Learner)17] for identified variants have also improved genome-wide analytic methods. Bioinformatics approaches to genome-wide analyses have advanced our ability to interrogate large datasets, with rapid detection of copy number variants, de novo mutations, and absence of heterozygosity from exome and genome variant data,18 23 as well as prediction tools such as the probability of loss-of-function (LoF) intolerance9 and the likelihood that a truncating variant will escape from nonsense-mediated decay.24 Increasing use of expression catalogs, such as the Genotype-Tissue Expression database (GTEx), has also been harnessed to prioritize candidate disease genes through comparison of their tissue expression patterns to those of the disease trait of interest.25 The development of genome-wide assays and a tremendous toolkit for genomic variant analyses has led to a time of rapid growth in our understanding of genetic variation and its impact on human health. Below we highlight recent accomplishments in the field of human genetics and genomics.

1.2 Current Status of Discovery

1.2.2

Mendelian Conditions

Rare disease has been defined as a disease trait impacting fewer than 200,000 individuals in the US population. Traditionally, these conditions are associated with variants that are rare in the population (well below 1%) and convey a large effect on trait manifestation. Such traits are often referred to as “Mendelian conditions,” as expression of the disease trait follows expected Mendelian modes of inheritance for a monogenic, or single locus, trait: autosomal dominant, autosomal recessive, X-linked, or mitochondrial. The diagnosis of a Mendelian condition can have immediate clinical impact, providing a molecular diagnosis and recurrence risk information for the affected family, and informing expectant medical management, and potentially therapeutic management, for the individual. This clinical value underscores the need for complete functional and phenotypic annotation of all B20,000 genes in the human genome. The current pace of human disease gene discovery for Mendelian conditions has never been greater and shows no evidence of slowing down. Despite this, to date, only 4083 genes (representing B20% of the genes in the human genome) are cataloged in the Online Mendelian Inheritance in Man (OMIM) database as having one or more high-penetrance disease traits (www.OMIM. org; May 3, 2019). These data underscore both the high proportion of human genes that remain to be phenotypically annotated, and the complexity of gene phenotype relationships, which do not always follow a one-to-one ratio. Beyond simply the identification of novel disease genes underlying Mendelian conditions, a number of key discoveries have elucidated this very relationship between genes and their associated phenotypes. Traditional thinking led to a one gene-one disease model whereby a single gene or locus was associated with a particular disease trait, with inheritance following either a dominant or recessive pattern. However, there are increasing examples of gene phenotype relationships that break with this traditional mold, underscoring the degree of allelic and locus heterogeneity, variability in penetrance and expression of disease traits, and combinatorial effects of rare variants at more than one locus in human disease. Genes such as RET may be associated with more than one disease trait, with rare constitutional variants leading to autosomal dominant multiple endocrine neoplasia type 2A (OMIM #171400) or the autosomal dominant neurocristopathy Hirschsprung disease (OMIM #142623). There are also increasing examples of genes associated with both dominant and recessive inheritance of disease traits. For some such genes, the dominantly inherited (due to monoallelic variation) trait is more severe (GJB2, KIF1A, MAB21L2, NALCN), whereas for other such genes the recessively inherited (due to biallelic variation) trait is more severe (AARS, CLCN1, EGR2, ROR2).26 Monoallelic and biallelic

3

4

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

variants in CLCN1 lead to the same disease trait: dominantly or recessively inherited myotonia congenita (OMIM #160800, #255700). In contrast, variants in ATAD3A consistently affect the neurologic system, but clinically observed phenotypes are distinct when the etiologic variant is monoallelic (developmental delay, axonal neuropathy, hypotonia, hypertrophic cardiomyopathy) or biallelic (developmental delay, hypotonia, ataxia, seizures, and congenital cataracts with cerebellar atrophy and hypoplastic optic nerves).27 Several of the most impactful discoveries in rare disease have described the effect of genetic variation involving more than one locus on the expression of disease traits.

1.2.2.1 Digenic Inheritance One such example is digenic inheritance (Fig. 1.1), which requires variation at two independent loci for the manifestation of a single disease trait. Models of multilocus variation

Digenic inheritance

Dual molecular diagnoses

Incomplete penetrance

Mutational burden

No. of loci

2

2 or more

1 or more

2 or more

No. of disease traits

1

2 or more

1

1

Variant frequency

Rare

Rare

Rare + common

Rare

Clinical outcome

Phenotypic features resulting from single disease trait

Blended phenotypes resulting from 2 (or more) disease traits

Phenotypic features resulting from single disease trait

Modified phenotypic features resulting from single disease trait

Genetic pedigree Distinct chromosomes Fully penetrant rare variants Common and/or low penetrance variants Affected individual

FIGURE 1.1 Models of multilocus variation. Digenic inheritance is characterized by rare variation at two loci, resulting in a single Mendelian disease trait. In contrast, dual (or multiple) molecular diagnoses result from independently segregating rare variants at two or more loci, resulting in a blended phenotype derived from two or more disease traits. Dual diagnoses resulting from biallelic variation at one locus (leading to a recessive condition) and de novo variation at a second locus are illustrated here. Incomplete penetrance of an apparently dominant disease trait can result when a combination of one rare variant plus one common hypomorphic variant is required for phenotypic expression of a disease trait; such cases may involve a rare and common variant at the same locus (in trans), or at two distinct loci. Note that the incomplete penetrance can be due to a second relatively deleterious variant in affected individuals or a second protective variant in unaffected individuals. Mutational burden may modify the quality or severity of phenotypic features when a rare, highly penetrant variant is identified in conjunction with one or more hypomorphic variants involving genes in a shared pathway.

1.2 Current Status of Discovery

Facioscapulohumeral dystrophy type 2 (MIM: 158901), which involves rare variation in SMCHD1 on chromosome 18 and a permissive DUX4 allele on chromosome 4, illustrates the model of digenic inheritance: an individual must have the required variation at both loci in order to have expression of disease.28,29

1.2.2.2 Dual (Multiple) Molecular Diagnoses In contrast, dual or multiple molecular diagnoses (Fig. 1.1) also require variation at two or more independently segregating loci, but they result in the manifestation of two or more disease traits, leading to a blended clinical phenotype. Although examples of such cases with two or more molecular diagnoses are not new, analyses of clinical cohorts30 36 have delineated the true extent of multiple molecular diagnoses, which have been estimated to occur in at least 4.9% of all clinical cases for which ES is diagnostic.30 In some instances affected individuals present with a blending of traits resulting from two conditions with overlapping phenotypic features, whereas in others the clinical presentation of the affected individual reflects the summation of two distinct sets of phenotypic features. At least one report describes differential expression of two conditions over time in twin siblings whose neonatal course was typical of Prader Willi syndrome (PWS) due to a paternally inherited 15q11.2 deletion, but who ultimately developed clinical features atypical for PWS and more typical of Pitt Hopkins syndrome due to rare variation in TCF4.37

1.2.2.3 Phenotypic Expansion Individuals and families are classified as having phenotypic expansion when the observed clinical traits extend beyond those previously reported in association with a Mendelian condition.38 Phenotypic expansion may be described in association with newly reported disease genes, for which the full phenotypic spectrum of the associated condition has simply not yet been appreciated or observed. Notably, multiple molecular diagnoses have been found to underlie a proportion of cases with apparent phenotypic expansion; the phenotypic features that extended beyond those previously reported in association with the identified gene/variant were ultimately found to be associated with rare pathogenic variants at second, and sometimes third, loci.20

1.2.2.4 Incomplete Penetrance The observation of incomplete penetrance (Fig. 1.1) has led to the discovery of several examples of non-Mendelian inheritance, including the combination of rare and common variants at either a single locus or more than one locus. Two conditions in particular illustrate molecular models for incomplete penetrance: compound inheritance of both rare and common variants, occurring either at a single locus, or at more than one locus. In an analysis of

5

6

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

161 unrelated Han Chinese individuals with congenital scoliosis, heterozygous null mutations in TBX6 were identified as etiologic.39 Relatives sharing the 16p11.2 deletion demonstrated incomplete penetrance, and this observation led to the discovery that congenital scoliosis represented an autosomal recessive trait associated with biallelic inheritance of a common, hypomorphic TBX6 allele, a haplotype comprising three common single nucleotide polymorphisms (minor allele frequency 0.44 in Chinese population, 0.33 in Caucasians), and a rare TBX6 null allele (minor allele frequency of 16p11.2 deletion is 0.0003 worldwide). These findings were replicated in an additional Chinese congenital scoliosis cohort and worldwide case series with 16p11.2 deletions.39 The common haplotype may result in a sensitized genomic background against which the null allele is fully penetrant; an analogous model has been described in analyses of distal-acting enhancers in mammalian limb development.40 Similarly, in an analysis of 13 families with nonsyndromic midline craniosynostosis, Timberlake et al. identified de novo and rare, inherited variants in SMAD6 that segregated with the phenotype.41 The observation of incomplete penetrance (,60%) in this cohort prompted the discovery of a common variant (allele frequency 0.41 in the population) downstream of BMP2, previously identified by GWAS, which explained the incomplete penetrance: individuals with craniosynostosis demonstrated digenic inheritance of a rare variant in SMAD6 and the common BMP2 variant, underscoring the clinical relevance of the epistatic interaction between these two genes.

1.2.2.5 Mutational Burden Multilocus rare variation has been described as underlying some cases of apparently common disease, including neuropathy and dementia. For example, rare variation at additional loci, that is, a mutational burden (Fig. 1.1), has been described in cases of Charcot Marie Tooth neuropathy, in which probands carried a highly penetrant Mendelizing variant in addition to one or more rare variants that by themselves are not sufficient to cause disease but can modify the phenotypic severity of the observed neuropathy.42 The occurrence of mutational burden impacting the expression of disease has been demonstrated for rare variants in lysosomal storage disorder genes in association with susceptibility to Parkinson’s disease,43 and rare variants in amyotrophic lateral sclerosis (ALS) impacting the age of onset of ALS.44

1.2.3

Common Disease

Common diseases such as high blood pressure and elevated cholesterol are typically thought to result from a combination of both common variants, each having a small impact on the clinical expression of disease, and environmental exposure. The disease trait results from the interaction of these

1.2 Current Status of Discovery

genetic and environmental factors and displays complex patterns of inheritance. GWAS have been predominant in identifying common variants that demonstrate a statistical association with common disease. The overarching goal of common disease research has been to elucidate genetic etiologies of common disease to inform risk stratification of presymptomatic individuals, and to apply this knowledge to the development of risk prevention strategies. Recent publications have begun to address the clinical implications of genetic risk for common diseases such as coronary artery disease (CAD), Alzheimer’s disease, hypercholesterolemia, and schizophrenia.45 47 Despite the anticipation of the clinical utility of such discoveries, to date, risk predictions derived from polygenic risk scores have not been sufficient to impact clinical care, and prospective clinical trials demonstrating the efficacy of therapeutic interventions in the setting of elevated risk scores have been lacking. A recent report describing the ability of genome-wide polygenic risk scores to identify individuals with risk levels for CAD that are similar to the risk observed in individuals with monogenic forms of CAD suggests that the field may finally have a foothold by which to study the efficacy of clinical interventions on individuals identified to have a high polygenic risk of CAD.45

1.2.3.1 Rare Variant Contributions to Common Disease Despite progress in the elucidation of common variants and their contribution to common disease, for many common disease traits, the identified variants do not fully account for the heritability of common traits. This observation has been termed the “missing heritability.”48,49 One reason for this missing heritability may derive from the limitations of GWAS-based approaches, which are simply not designed to analyze rare variant or copy number variant signals; they are also not powered to detect a set of rare variants that may each drive disease trait expression in only a small subset of all studied individuals.50 Indeed, evolutionary theory posits that variants conferring disease will be rare in the population. In support of this hypothesis, there are now many examples of the role of rare variation in the expression of common disease traits. Steroid-resistant nephrotic syndrome associated with variants in NUP93, NUP205, XPO5, and FAT1 demonstrates how Mendelian disorders may underlie some cases of apparently common disease.51,52 Importantly, etiologic genes in such cases can inform relevant biological pathways in disease, ultimately providing potential targets for therapeutic development. The study of steroid-resistant nephrotic syndrome has elucidated the pathways underlying defective podocyte migration in this condition: Rho-like small GTPase signaling pathways and BMP7-induced SMAD signaling reveal therapeutic targets for drug development.51,52 A recent analysis of electronic health records and genomic data from individuals with apparently common disease

7

8

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

ultimately revealed 18 molecular diagnoses of Mendelian conditions that were previously not recognized clinically.53 There are also several examples of genes identified because of their association with rare disease traits following a recessive inheritance pattern that was ultimately found to modify risk for common disease traits in heterozygous “carrier” individuals. In a majority of these examples, the etiologic variant conferring a common disease risk is rare (MAF , 1%), with CFTR F508del being one notable exception, depending on the population studied. There are analogous examples of dominant disease traits for which rare variants may present either as a more severe, rare Mendelian condition, or as a milder, common trait. For example, individuals with deletion of PMP22 typically present with hereditary neuropathy with liability to pressure palsies (MIM: 162500), but some individuals will instead develop a milder carpal tunnel syndrome; both traits are observed to segregate dominantly in families.54,55

1.3 ESTABLISHING A LINK BETWEEN GENOTYPE AND PHENOTYPE The rapid evolution of human genetics and genomics has seen direct impacts on both the practice of clinical genetics, and our understanding of human biology and disease pathogenesis. In the clinical setting, the implementation of CMA and ES has led to increased molecular diagnostic rates (compared to single gene or panel-based testing), providing patients and their families with molecular diagnoses and, for many, ending the “diagnostic odyssey.”31 35,56 58 Beyond information regarding the molecular diagnosis or diagnoses, genome-wide testing can provide information regarding recurrence risks for the family. This becomes particularly important when two or more molecular diagnoses are identified. Incidental, or secondary findings, defined as genetic testing results that are unrelated to the clinical indication for genetic testing, are also reported in the clinical setting. The American College of Medical Genetics and Genomics (ACMG) has developed a set of recommendations for the reporting of incidental findings by diagnostic laboratories.59 61 One strength of genome-wide approaches is that they do not depend on presupposition of the correct clinical diagnosis, and thus can identify molecular diagnoses in individuals for whom the clinical diagnosis was confounded by an atypical presentation. In the diagnostic laboratory, molecular diagnostic rates for ES alone in sequential referral cases range from 25% to 50%, and this is impacted by subject age and the phenotype studied.31 33,35,57 Novel

1.3 Establishing a Link Between Genotype and Phenotype

disease gene discovery has certainly driven an increase in molecular diagnoses,62 and concurrent testing with ES and CMA pushes this diagnostic rate even higher,63 though many clinicians favor a stepwise approach to testing. Once a molecular diagnosis has been obtained, physicians can tailor medical management, and in some cases therapeutics, to the individual molecular diagnosis. For example, a family with a child referred because of short stature may be counseled to minimize protein intake following the diagnosis of lysinuric protein intolerance (OMIM #222700), which can cause metabolic derangements characterized by hyperammonemia in the setting of high protein intake.64 Alternatively, a neonate with permanent neonatal diabetes mellitus (OMIM #606176) due to an activating mutation in KCNJ11 may be treated with sulfonylurea medications (sulfonylureas function as a precision therapy for this condition, as they specifically target the ATP-sensitive potassium channel encoded by KCNJ11). The precision approach to neonatal diabetes in individuals with pathogenic variants in KCNJ11 illustrates the important role that human disease gene discovery plays in the elucidation of biological mechanisms underlying human disease. Such discoveries form the foundation for developing precision medicine approaches to disease. One of the greatest success stories for this approach to date involves the identification of PCSK9 and its role in cholesterol metabolism. PCSK9 was first described as a gene underlying familial hypercholesterolemia (MIM: 603776) in 2003,65 with further studies demonstrating that gain-of-function variants were associated with hypercholesterolemia, whereas LoF variants were associated with hypocholesterolemia and protection against heart disease in the Hispanic population of Dallas.66 68 The development of monoclonal antibodies targeting PCSK9 quickly ensued, and this targeted therapeutic approach has been integrated into clinical practice to treat individuals with cardiovascular disease and familial hypercholesterolemia. The fundamental goals of human disease gene discovery are to gain a deeper understanding of the molecular pathogenesis and biology underlying human disease traits, and ultimately harness this knowledge to design individualized, targeted therapeutics. As shown previously, the complexity of human genetic variability is immense. Model organisms have long offered a window through which we can study the organismal impact of perturbation of discrete biological processes. Allelic series in model organisms further enable a dissection of variants with amorphic, hypomorphic, hypermorphic, neomorphic, and antimorphic effects, as well as the combinatorial effects of two or more variant classes at a single locus. For example, the generation of mouse models of TBX6-associated scoliosis elucidated the critical role of gene dosage in this human condition that results from an LoF (amorphic) TBX6 allele in trans with a hypomorphic TBX6 allele (Fig. 1.2).69 Studying

9

10

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

Genotype

Phenotype Scoliosis

TBX6 alleles: LoF + hypomorph

Butterfly vertebrae

Hemivertebrae

Model organism FIGURE 1.2 Model organisms can elucidate the relationship between genotype and phenotype. Individuals with a loss-of-function (LoF) TBX6 allele (yellow diamond) in trans with a hypomorphic common TBX6 allele (gray diamond) have early onset scoliosis associated with specific vertebral phenotypic traits (butterfly vertebrae, hemivertebrae). Model organism studies of the Tbx6 LoF 1 hypomorphic allele combination revealed gene dosage of Tbx6 as the mechanism underling the observed human phenotype.

genes in parallel in humans and model organisms provides insight into the cause of a particular disease, accelerating understanding of the pathogenic processes. Hence model organisms have become an essential part of biomedical research.70 One important question to address is the relationship between genotype and phenotype. Toward that end, the systematic genetic screens that can be performed in model organisms are a great advantage. For example, the construction of gene deletion collections in budding yeast,71 fission yeast,72 and Escherichia coli,73 plus the genome-wide RNA interference (RNAi) screens in worms74 and flies75 allow the identification of genes that influence a particular phenotype or trait. In addition, GS and lower genotyping costs are increasing the number of genetic variants that can be linked to trait variation.76 As a result, the connections between genes and phenotypes are understood more completely and more systematically in model organisms than in humans. In addition, these studies allow us to elucidate the immense genetic complexity of phenotypic traits in an impartial manner. For example, in yeast it is very common to identify that a specific trait is affected by multiple genes, a phenomenon called locus heterogeneity. Pleiotropy is also commonly observed, in which many genes are related to a great diversity of

1.4 Model Organisms

traits.77,78 The concept of “guilt by association” predicts that if two gene products work in the same pathway or process, then mutations in these genes probably have overlapping phenotypic consequences.79,80 Many different types of evidence can be used to identify functionally associated genes. For example, genes that encode physically interacting proteins, which are coregulated or coevolving, are more likely to work in a common process.

1.4

MODEL ORGANISMS

The full genotype phenotype data available from model organisms provide an impressive resource for predicting connections between genes and genomic-scale phenotypes. Several biological processes are evolutionarily conserved, and discoveries in yeast, worms, flies, and rodents have a direct implication for human biology. However, some considerations must be taken into account to define/construct an appropriate model for the particular question that is being asked. We will describe the specific characteristics of each model, such as genetic/genomic composition, availability of phenotypic data, tissue-specific expression, and specific uses, in the respective chapters. Here we would like to review some important general concepts to consider when working with model organisms. Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Not all genes are evolutionarily conserved from invertebrates to humans; for example, B35% of human genes have no obvious orthologs in flies,81 whereas the number of genes related to human diseases is B99.5% among orthologous human rodent sets.82 Interestingly, the genes associated with neurological and developmental/malformation diseases appear to have evolved slowly, whereas the genes related to diseases of the immune, hematological, and pulmonary systems have changed more rapidly.82 According to this idea, models of primate or human cells would probably be the most appropriate to use in the study of human diseases of the immune and hematological systems; rodent models may be better for studying genes related to neurological diseases, and development/malformation and metabolic diseases; flies or fish can also be used to study genes related to neurological and metabolic diseases, whereas yeast and worms may be more suitable as models of metabolic diseases given the lowest total conservation found. However, in each specific case, the gene or variant of interest must be analyzed a priori to determine the best study model to be used. In general, for genes that are evolutionarily conserved, simple model organisms with short generation times that are amenable to inexpensive and efficient genetic manipulations can provide rapid insights into basic biological functions through detailed in vivo studies.83

11

12

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

Normally, orthologous genes retain the same molecular function in the course of evolution. It was once assumed that since the orthologous genes of relatively closely related species have similar molecular functions, their removal from the genome should cause similar phenotypes in both species. However, it became obvious that orthologous genes often cause different phenotypes in different species, despite the similar molecular function of their respective proteins. This gave rise to the concept of “phenologs,” or “orthologous phenotypes,” defined as related phenotypes by the orthology of the associated genes in two organisms.84 Phenologs can be discovered by grouping the set of genes that are known to cause a particular human disease and then determining the phenotypes produced by mutations in the orthologous gene set in model organisms. It is said that two phenotypes are orthologous if they share a significantly greater set of common orthologous genes than would be expected by chance, even if the phenotypes may appear different in both species. These phenologs are extremely useful to identify nonobvious models for human disease84: they may be used to predict additional candidate disease genes85 and to identify potential therapeutic targets for human diseases.86 Another interesting concept to consider when working with model organisms is individual phenotypic variability. In humans, it is well known that many disease-associated mutations have incomplete penetrance (not all individuals carrying the mutation develop the disease) or have variable expressivity (individuals differ in the severity of disease). These phenomena can be observed even in identical twins.87 Individual phenotypic variation is also manifested in inbred model organisms even when they are isogenic and raised in highly controlled environments.88 For example, inbred rodent strains still show substantial normal variation in quantitative traits (such as kidney length or body weight) even when the environment is tightly controlled.89 The causes of this variation in genome outcome despite a controlled environment can be related to random or stochastic molecular variation in each individual: the genetic variation present in one generation can influence phenotypic traits in the next generation (even if individuals do not inherit this variation); or the environment experienced by one generation can influence phenotypic variation in the next generation.88 This normal individual phenotypic variability must be taken into account when analyzing contributions of genetic variations in model organisms. As a final observation, it is important to note that with the amount of genetic variation that is already evident in the human genome, a key aspect of “modeling” will be to assess whether a human variant can be pathogenic. To address this, a first and key experiment will be to determine whether the genes are interchangeable between the two species. In this era of genomic data revolution, analyzing the behavior of human genes in the context of

References

model organisms will be an important tool for studying the function of the gene, the pathogenicity of the genomic variant, and the pathogenesis of the disease.

References 1. Tjio JH, Levan A. The chromosome number of man. Hereditas. 1956;42(1 2):1 6. 2. Down JLH. Observations on an ethnic classification of idiots. Lond Hosp Rep. 1866;3:259 262. 3. Jacobs PA, Baikie AG, Court Brown WM, Strong JA. The somatic chromosomes in mongolism. Lancet. 1959;1(7075):710. 4. Gusella JF, Wexler NS, Conneally PM, et al. A polymorphic DNA marker genetically linked to Huntington’s disease. Nature. 1983;306(5940):234 238. 5. Huntington’s Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell. 1993;72 (6):971 983. 6. Lupski JR. Genomic disorders ten years on. Genome Med. 2009;1(4):42. 7. Lupski JR, Stankiewicz P. Genomic disorders: molecular mechanisms for rearrangements and conveyed phenotypes. PLoS Genet. 2005;1(6):e49. 8. Harel T, Lupski JR. Genomic disorders 20 years on-mechanisms for clinical manifestations. Clin Genet. 2018;93(3):439 449. 9. Lek M, Karczewski KJ, Minikel EV, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285 291. 10. Genomes Project C, Abecasis GR, Altshuler D, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061 1073. 11. The ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. The ARIC investigators. Am J Epidemiol. 1989;129(4):687 702. 12. Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812 3814. 13. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248 249. 14. Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014;11(4):361 362. 15. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19(9):1553 1561. 16. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46 (3):310 315. 17. Ioannidis NM, Rothstein JH, Pejaver V, et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet. 2016;99(4):877 885. 18. Eldomery MK, Coban-Akdemir Z, Harel T, et al. Lessons learned from additional research analyses of unsolved clinical exome cases. Genome Med. 2017;9(1):26. 19. Gambin T, Coban Akdemir Z, Yuan B, et al. Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort. Nucleic Acids Res. 2017;45 (4):1633 1648.

13

14

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

20. Karaca E, Posey JE, Coban Akdemir Z, et al. Phenotypic expansion illuminates multilocus pathogenic variation. Genet Med. 2018;20(12):1528 1537. Available from: https://doi.org/ 10.1038/gim.2018.33. 21. Amarasinghe KC, Li J, Halgamuge SK. CoNVEX: copy number variation estimation in exome sequencing data using HMM. BMC Bioinf. 2013;14(Suppl 2):S2. 22. Krumm N, Sudmant PH, Ko A, et al. Copy number variation detection and genotyping from exome sequence data. Genome Res. 2012;22(8):1525 1532. 23. Fromer M, Moran JL, Chambert K, et al. Discovery and statistical genotyping of copynumber variation from whole-exome sequencing depth. Am J Hum Genet. 2012;91 (4):597 607. 24. Coban-Akdemir Z, White JJ, Song X, et al. Identifying genes whose mutant transcripts cause dominant disease traits by potential gain-of-function alleles. Am J Hum Genet. 2018;103 (2):171 187. 25. The GTEx Consortium. The genotype-tissue expression (GTEx) project. Nat Genet. 2013;45 (6):580 585. 26. Harel T, Yesil G, Bayram Y, et al. Monoallelic and biallelic variants in EMC1 identified in individuals with global developmental delay, hypotonia, scoliosis, and cerebellar atrophy. Am J Hum Genet. 2016;98(3):562 570. 27. Harel T, Yoon WH, Garone C, et al. Recurrent de novo and biallelic variation of ATAD3A, encoding a mitochondrial membrane protein, results in distinct neurological syndromes. Am J Hum Genet. 2016;99(4):831 845. 28. Lemmers RJ, Tawil R, Petek LM, et al. Digenic inheritance of an SMCHD1 mutation and an FSHD-permissive D4Z4 allele causes facioscapulohumeral muscular dystrophy type 2. Nat Genet. 2012;44(12):1370 1374. 29. Lupski JR. Digenic inheritance and Mendelian disease. Nat Genet. 2012;44(12):1291 1292. 30. Posey JE, Harel T, Liu P, et al. Resolution of disease phenotypes resulting from multilocus genomic variation. N Engl J Med. 2017;376(1):21 31. 31. Posey JE, Rosenfeld JA, James RA, et al. Molecular diagnostic experience of whole-exome sequencing in adult patients. Genet Med. 2016;18(7):678 685. 32. Yang Y, Muzny DM, Xia F, et al. Molecular findings among patients referred for clinical whole-exome sequencing. JAMA. 2014;312(18):1870 1879. 33. Yang Y, Muzny DM, Reid JG, et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med. 2013;369(16):1502 1511. 34. Farwell KD, Shahmirzadi L, El-Khechen D, et al. Enhanced utility of family-centered diagnostic exome sequencing with inheritance model-based analysis: results from 500 unselected families with undiagnosed genetic conditions. Genet Med. 2015;17(7):578 586. 35. Retterer K, Juusola J, Cho MT, et al. Clinical application of whole-exome sequencing across clinical indications. Genet Med. 2016;18(7):696 704. 36. Balci TB, Hartley T, Xi Y, et al. Debunking Occam’s razor: diagnosing multiple genetic diseases in families by whole-exome sequencing. Clin Genet. 2017;92(3):281 289. 37. Jehee FS, de Oliveira VT, Gurgel-Giannetti J, et al. Dual molecular diagnosis contributes to atypical Prader-Willi phenotype in monozygotic twins. Am J Med Genet A. 2017;173 (9):2451 2455. 38. Chong JX, Buckingham KJ, Jhangiani SN, et al. The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am J Hum Genet. 2015;97(2):199 215. 39. Wu N, Ming X, Xiao J, et al. TBX6 null variants and a common hypomorphic allele in congenital scoliosis. N Engl J Med. 2015;372(4):341 350.

References

40. Osterwalder M, Barozzi I, Tissieres V, et al. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature. 2018;554(7691):239 243. 41. Timberlake AT, Choi J, Zaidi S, et al. Two locus inheritance of non-syndromic midline craniosynostosis via rare SMAD6 and common BMP2 alleles. Elife. 2016;:5. 42. Gonzaga-Jauregui C, Harel T, Gambin T, et al. Exome sequence analysis suggests that genetic burden contributes to phenotypic variability and complex neuropathy. Cell Rep. 2015;12 (7):1169 1183. 43. Robak LA, Jansen IE, van Rooij J, et al. Excessive burden of lysosomal storage disorder gene variants in Parkinson’s disease. Brain. 2017;140(12):3191 3203. 44. Cady J, Allred P, Bali T, et al. Amyotrophic lateral sclerosis onset is influenced by the burden of rare variants in known amyotrophic lateral sclerosis genes. Ann Neurol. 2015;77 (1):100 113. 45. Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50 (9):1219 1224. 46. Natarajan P, Peloso GM, Zekavat SM, et al. Deep-coverage whole genome sequences and blood lipids among 16,324 individuals. Nat Commun. 2018;9(1):2606. 47. Escott-Price V, Myers AJ, Huentelman M, Hardy J. Polygenic risk score analysis of pathologically confirmed Alzheimer disease. Ann Neurol. 2017;82(2):311 314. 48. Maher B. Personal genomes: the case of the missing heritability. Nature. 2008;456 (7218):18 21. 49. Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747 753. 50. Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet. 2012;13(2):135 145. 51. Braun DA, Sadowski CE, Kohl S, et al. Mutations in nuclear pore genes NUP93, NUP205 and XPO5 cause steroid-resistant nephrotic syndrome. Nat Genet. 2016;48(4):457 465. 52. Gee HY, Sadowski CE, Aggarwal PK, et al. FAT1 mutations cause a glomerulotubular nephropathy. Nat Commun. 2016;7:10822. 53. Bastarache L, Hughey JJ, Hebbring S, et al. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science. 2018;359:1233 1239. 54. Del Colle R, Fabrizi GM, Turazzini M, Cavallaro T, Silvestri M, Rizzuto N. Hereditary neuropathy with liability to pressure palsies: electrophysiological and genetic study of a family with carpal tunnel syndrome as only clinical manifestation. Neurol Sci. 2003;24(2):57 60. 55. Potocki L, Chen KS, Koeuth T, et al. DNA rearrangements on both homologues of chromosome 17 in a mildly delayed individual with a family history of autosomal dominant carpal tunnel syndrome. Am J Hum Genet. 1999;64(2):471 478. 56. Meng L, Pammi M, Saronwala A, et al. Use of exome sequencing for infants in intensive care units: ascertainment of severe single-gene disorders and effect on medical management. JAMA Pediatr. 2017;171(12):e173438. 57. Lee H, Deignan JL, Dorrani N, et al. Clinical exome sequencing for genetic identification of rare Mendelian disorders. JAMA. 2014;312(18):1880 1887. 58. Lazaridis KN, Schahl KA, Cousin MA, et al. Outcome of whole exome sequencing for diagnostic odyssey cases of an individualized medicine clinic: the Mayo Clinic experience. Mayo Clin Proc. 2016;91(3):297 307. 59. Green RC, Berg JS, Grody WW, et al. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med. 2013;15(7):565 574.

15

16

CHAPTER 1:

I n t r o d u c t i o n t o H u m a n Ge n e t i c s

60. Kalia SS, Adelman K, Bale SJ, et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SFv2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet Med. 2017;19(2):249 255. 61. Directors ABo. ACMG policy statement: updated recommendations regarding analysis and reporting of secondary findings in clinical genome-scale sequencing. Genet Med. 2015;17(1):68 69. 62. Liu P, Meng L, Normand EA, et al. Post-reporting reanalysis of exome sequencing data molecular diagnostic and clinical genomic outcomes. N Engl J Med. 2019 (in press). 63. Dharmadhikari AV, Liu P, Dai H, et al. Copy number variant and runs of homozygosity detection by microarrays enabled more precise molecular diagnoses in 11,091 clinical exome cases. Genome Med. 2019 (in press). 64. Posey JE, Burrage LC, Miller MJ, et al. Lysinuric protein intolerance presenting with multiple fractures. Mol Genet Metab Rep. 2014;1:176 183. 65. Abifadel M, Varret M, Rabes JP, et al. Mutations in PCSK9 cause autosomal dominant hypercholesterolemia. Nat Genet. 2003;34(2):154 156. 66. Cohen J, Pertsemlidis A, Kotowski IK, Graham R, Garcia CK, Hobbs HH. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat Genet. 2005;37(2):161 165. 67. Kotowski IK, Pertsemlidis A, Luke A, et al. A spectrum of PCSK9 alleles contributes to plasma levels of low-density lipoprotein cholesterol. Am J Hum Genet. 2006;78(3):410 422. 68. Johansen CT, Hegele RA. Using Mendelian randomization to determine causative factors in cardiovascular disease. J Intern Med. 2013;273(1):44 47. 69. Yang N, Wu N, Zhang L, et al. TBX6 compound inheritance leads to congenital vertebral malformations in humans and mice. Hum Mol Genet. 2018. Available from: https://doi.org/ 10.1093/hmg/ddy358. 70. Wangler MF, Yamamoto S, Chao HT, et al. Model organisms facilitate rare disease diagnosis and therapeutic research. Genetics. 2017;207(1):9 27. 71. Giaever G, Chu AM, Ni L, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418(6896):387 391. 72. Kim DU, Hayles J, Kim D, et al. Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe. Nat Biotechnol. 2010;28(6):617 623. 73. Baba T, Ara T, Hasegawa M, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006;2. 2006 0008. 74. Kamath RS, Fraser AG, Dong Y, et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature. 2003;421(6920):231 237. 75. Dietzl G, Chen D, Schnorrer F, et al. A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature. 2007;448(7150):151 156. 76. Hobert O. The impact of whole genome sequencing on model system genetics: get ready for the ride. Genetics. 2010;184(2):317 319. 77. Dudley AM, Janse DM, Tanay A, Shamir R, Church GM. A global view of pleiotropy and phenotypically derived gene function in yeast. Mol Syst Biol. 2005;1. 2005 0001. 78. Hillenmeyer ME, Fung E, Wildenhain J, et al. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science. 2008;320(5874):362 365. 79. Fraser AG, Marcotte EM. A probabilistic view of gene function. Nat Genet. 2004;36 (6):559 564. 80. Lehner B, Lee I. Network-guided genetic screening: building, testing and using gene networks to predict gene function. Briefings Funct Genomics Proteomics. 2008;7(3):217 227.

References

81. Wangler MF, Yamamoto S, Bellen HJ. Fruit flies in biomedical research. Genetics. 2015;199 (3):639 653. 82. Huang H, Winter EE, Wang H, et al. Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes. Genome Biol. 2004;5(7):R47. 83. Lehner B. Genotype to phenotype: lessons from model organisms for human genetics. Nat Rev Genet. 2013;14(3):168 178. 84. McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM. Systematic discovery of nonobvious human disease models through orthologous phenotypes. Proc Natl Acad Sci USA. 2010;107(14):6544 6549. 85. Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM. Prediction of genephenotype associations in humans, mice, and plants using phenologs. BMC Bioinf. 2013;14:203. 86. Golden A. From phenologs to silent suppressors: identifying potential therapeutic targets for human disease. Mol Reprod Dev. 2017;84(11):1118 1132. 87. Roberts NJ, Vogelstein JT, Parmigiani G, Kinzler KW, Vogelstein B, Velculescu VE. The predictive capacity of personal genome sequencing. Sci Transl Med. 2012;4(133):133 158. 88. Burga A, Lehner B. Beyond genotype to phenotype: why the phenotype of an individual cannot always be predicted from their genome sequence and the environment that they experience. FEBS J. 2012;279(20):3765 3775. 89. Gartner K. A third component causing random variability beside environment and genotype. A reason for the limited success of a 30 year long effort to standardize laboratory animals? Lab Anim. 1990;24(1):71 77.

17