Methods 62 (2013) 1–2
Contents lists available at ScienceDirect
Methods journal homepage: www.elsevier.com/locate/ymeth
Guest Editor’s Introduction
Modeling gene expression
To respond to the growing streams of gene expression data now available in biology, this special issue of METHODS brings together experimental and modeling approaches to studies of gene expression. Contributions here address both general and speciﬁc aspects of gene expression modeling that employ mechanism-based understanding of gene expression. Since the advent of microarray technology in the 1990s, investigators have increasingly focused on statistical approaches to understand the genome-wide aspects of gene expression, and such analysis is now in a mature phase, used routinely in healthcare and industrial settings, for instance. Mathematical models of gene expression represent a more craftbased activity, but these have also increasingly moved from analysis of single genes or gene circuits to larger scale analysis. Here, the general outlines of how best to proceed in dealing with partially understood, complex systems are less clear, and we have asked skilled practitioners of this trade for their input in tackling such problems. Gene regulatory networks (GRN) are blueprints showing the dynamic regulatory linkages driving developmental programs and physiological responses on a cellular and organismal level. A number of recent studies have provided wiring diagrams of transcriptional and signaling pathways of microbes and metazoans; such displays of GRN can be impressive enough, but as a circuit diagram on a page they are of little predictive value. Quantitative modeling methods can take static blueprints and provide a live picture of the operation of a GRN, with insights on how it will respond to inputs or to damaging. Such modeling can also provide an indication of where the genetic information is incomplete, providing a guide for future experiments. In this issue, Saadatpour and Albert discuss Boolean models for regulatory networks; using the simpliﬁcation of having genes in an ‘‘on’’ or ‘‘off’’ state, these models provide a computationally tractable approach to interpret the action of complex genetic circuitry, such as that studied in the Drosophila embryo. Based on the success of Boolean models, some have argued that operation of developmental circuitry in reality is essentially a Boolean process, in which the relative levels of particular regulatory factors are less important than their overall appearance or loss . However, many molecular studies have demonstrated that transcriptional and signaling switches are capable of complex dynamic responses, and it would be surprising if such properties were not under active selection for proper activity of GRN. Sanchez, Choubey and Kondev describe how multi-state promoter dynamics can be captured by stochastic modeling; these efforts more realistically capture the variable nature of gene expression that is impacted by internal noise, as well as environmental variation. An important mediator of stochastic effects in eukaryotic gene expres1046-2023/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ymeth.2013.08.001
sion is chromatin, the protein complexes that compact DNA and regulate its access to transcription factors. Teif et al. provide methods to consider how DNA is not uniformly accessible to regulatory factors, but can be differentially occupied by nucleosomes, whose natural afﬁnity for certain sequences, or active movement in response to remodeling complexes, shapes the landscape for transcriptional regulators to act upon. The functioning of individual genetic elements of a network are indeed inﬂuenced by molecular details of promoter sequences, for instance, but for a larger-scale network view, general regulatory relationships must ﬁrst be established. Pioneering work by Jaeger and Reinitz  developed methods to derive possible regulatory interactions of the Drosophila embryonic developmental circuitry, and in this issue, Spirov and Holloway review how building on these approaches, evolutionary computation can provide tools to identify possible GRN in new systems, as well as identify pathways by which such networks have evolved to produce novel outputs in different species. These computational tools, married with sizable genomic resources, provide a way to understand how evolution sculpts GRN; our hope is that from these insights general principles of how complex developmental programs can vary through time. At the same time, such computational tools will provide information about the aberrant functioning of such networks in a disease state. Molecular studies indicate that most signaling and gene regulatory mechanisms were in place by the time of the Cambrian explosion, suggesting that – at a certain level – with GRN studies we are exploring a theme with variations. However, data gleaned from different biological systems can be very heterogeneous. Thus, an equally important, if perhaps less discussed aspect of quantitative modeling is the nature of empirical data, and how results of modeling can be compared to often incomplete experimental measurements. Using the Drosophila embryo as an example, Pargett and Umulis discuss sources of data uncertainty, methods useful for normalizing disparate types of data (e.g. ﬂuorescent signals representing mRNA or protein levels), and approaches that allow the investigator to identify optimal sets of parameters for different quantitative models. Quantitative models can offer predictions of high precision, thus knowing how such predictions align to spotty or qualitative observations of real systems in action is essential. One of the most commonly utilized technologies for assessing gene activity in vivo is ﬂuorescence microscopy, and technical aspects of exploiting such methods is described in Trisnadi et al., who focus on developmental control by the Drosophila transcription factor Dorsal. This insect homolog of the conserved NF-jB transcription factor initiates the dorsal/ventral developmental axis by direct interactions with dozens of differentially expressed targets. Here,
Guest Editor’s Introduction / Methods 62 (2013) 1–2
aspects of data capture from three-dimensional samples is discussed, along with understanding variation linked to developmental timing and as well as intrinsic noise. The authors discuss how they distill a large set of whole embryo measurements into a tractable set of parameters relating to Dorsal and a target gene, vnd. Sinha and colleagues have explored thermodynamic models for a distinct developmental network controlling anterior-posterior embryonic segmentation, also from Drosophila . Such models differ from the more abstract Boolean approaches in that detailed, DNA-sequence level information is utilized to assess possible interactions of transcription factors with regulatory sequences. Earlier studies in this area used very simplistic one-dimensional representations of spatially distinct expression; in their current paper (Samee et al.), these authors test the most sophisticated empirical data collected from three-dimensional images of the pathbreaking Berkeley Drosphila Transcriptional Network Project. Their paper demonstrates that for the practical application of their GEMSTAT thermodynamic models, the data requires considerable preprocessing, and show the conventional ﬁtting algorithms perform worse than their ‘‘weighted pattern generating potential’’ function. Using this as a basis, the contributions of homotypic cooperativity and additional transcription factors was revealed – this study shows how modeling may apply to future comprehensive gene expression datasets. The effective application of thermodynamic modeling to transcriptional regulation is of interest not only to gene network analysis, but also in an evolutionary context, to understand how cis regulatory elements vary on a population and species level. The apparently very relaxed sequence requirements for many transcriptional regulatory elements impose a substantial hurdle for interpretation of genomic information, because sequence variation in binding sites or spacing may or may not be functionally important. Martinez et al. show how simulated annealing, an optimization concept borrowed from metallurgy, can be used to survey high dimensional sequence space to ﬁnd sequences that are expected to produce a speciﬁc regulatory output. This method would be most helpful in analysis of candidate regulatory regions, which can be very large in higher eukaryotes, as well as design of elements for synthetic biology projects. In addition, to most effectively discriminate between different quantitative models, this method can identify sequences with the maximal predicted differences in predicted output, highlighting those constructs that should be empirically tested. The authors also describe an optimization algorithm to infer the likely sequences that separate two evolutionarily divergent cis regulatory elements; in essence, we can replay the tape of DNA evolution using not only parsimony in nucleotide transitions, as is customarily done, but also be applying a functional ﬁlter. It will be interesting to see if this method can be extended to entire gene networks, where interlinked functional outputs may involve compensation.
In all of the discussed modeling exercises, values for estimated parameters may relate to a discrete biochemical property (e.g. cooperative DNA binding), or they may apply to abstract properties that involve complex processes (activation of a promoter). How these parameters are arrived at can be affected by the estimation strategy employed, a process dealt with ad hoc in previous quantitative modeling studies. Suleimeinov et al. compare global and local parameter estimation strategies using the thermodynamic framework discussed above for embryonic patterning genes. Depending on data quality, the computational cost of a global method can provide signiﬁcantly better results, yet the local search methods preferred in many studies appear to perform adequately on noiser datasets, or semiquantitative data. Putting a model through its paces, by running different types of synthetic through it, can show the sensitivity of the particular model. Some studies have tried to develop biochemical insights from inferred parameters, although biological models often exhibit ‘‘sloppiness’’ in parameter values that indicates a high degree of parameter compensation or lack of impact on the overall model output, meaning that the absolute quantitative values of these parameters may not be correct . Taylor et al. provide a useful primer on sensitivity analysis, using examples of simple gene circuit models to demonstrate how local and global analytic methods can reveal differential sensitivity. The models of two or three interlinked promoters from bacteriophage lambda, a systems orders of magnitude simpler than some of the GRN considered in the above studies, exhibit complex behavior and different sensitivities. These studies highlight the importance of considering the context and data quality when designing the parameter estimation, interpretation of estimated parameters, and overall modeling strategies. We hope that the diverse topics addressed in this area will stimulate new ideas and approaches in the rapidly developing ﬁeld of quantitative analysis of gene expression, where mechanism and modeling are merging in exciting and unexpected manners. References  R.N. Gutenkunst, J.J. Waterfall, F.P. Casey, K.S. Brown, C.R. Myers, J.P. Sethna, PLoS Comput. Biol. 3 (2007) 1871–1878.  X. He, Md. Abul Hassan Samee, Charles Blatti, Saurabh Sinha, PLoS Comput. Biol. 6 (2010) e1000935.  J. Jaeger, S. Surkova, M. Blagov, H. Janssens, D. Kosman, K.N. Kozlov, Myasnikova E. Manu, C.E. Vanario-Alonso, M. Samsonova, D.H. Sharp, J. Reinitz, Nature 430 (2004) 368–371.  I.S. Peter, E. Faure, E.H. Davidson, Proc. Natl. Acad. Sci. USA 109 (2012) 16434– 16442.
David Arnosti E-mail address: [email protected]