Integrative Functional Genomics of Cancer (IFGC)

Team Publications

Year of publication 2012

Raghunath Chatterjee, Jianfei Zhao, Ximiao He, Andrey Shlyakhtenko, Ishminder Mann, Joshua J Waterfall, Paul Meltzer, B K Sathyanarayana, Peter C FitzGerald, Charles Vinson (2012 Oct 11)

Overlapping ETS and CRE Motifs ((G/C)CGGAAGTGACGTCA) preferentially bound by GABPα and CREB proteins.

G3 (Bethesda, Md.) : 1243-56 : DOI : 10.1534/g3.112.004002 Learn more
Summary

Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X(4)-N(1-30)-X(4)) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif ((C/G)CCGGAAGCGGAA) and the ETS⇔CRE motif ((C/G)CGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif.

Fold up
Joshua J Waterfall, Paul S Meltzer (2012 Mar 24)

Targeting epigenetic misregulation in synovial sarcoma.

Cancer cell : 323-4 : DOI : 10.1016/j.ccr.2012.02.023 Learn more
Summary

Like many sarcomas, synovial sarcoma is driven by a characteristic oncogenic transcription factor fusion, SS18-SSX. In this issue of Cancer Cell, Su et al. elucidate the protein partners necessary for target gene misregulation and demonstrate a direct effect of histone deacetylase inhibitors on the SS18-SSX complex composition, expression misregulation, and apoptosis.

Fold up

Year of publication 2011

J Keith Killian, Sven Bilke, Sean Davis, Robert L Walker, Erich Jaeger, M Scott Killian, Joshua J Waterfall, Marina Bibikova, Jian-Bing Fan, William I Smith, Paul S Meltzer (2011 Jun 7)

A methyl-deviator epigenotype of estrogen receptor-positive breast carcinoma is associated with malignant biology.

The American journal of pathology : 55-65 : DOI : 10.1016/j.ajpath.2011.03.022 Learn more
Summary

We broadly profiled DNA methylation in breast cancers (n = 351) and benign parenchyma (n = 47) for correspondence with disease phenotype, using FFPE diagnostic surgical pathology specimens. Exploratory analysis revealed a distinctive primary invasive carcinoma subclass featuring extreme global methylation deviation. Subsequently, we tested the correlation between methylation remodeling pervasiveness and malignant biological features. A methyl deviation index (MDI) was calculated for each lesion relative to terminal ductal-lobular unit baseline, and group comparisons revealed that high-grade and short-survival estrogen receptor-positive (ER(+)) cancers manifest a significantly higher MDI than low-grade and long-survival ER(+) cancers. In contrast, ER(-) cancers display a significantly lower MDI, revealing a striking epigenomic distinction between cancer hormone receptor subtypes. Kaplan-Meier survival curves of MDI-based risk classes showed significant divergence between low- and high-risk groups. MDI showed superior prognostic performance to crude methylation levels, and MDI retained prognostic significance (P < 0.01) in Cox multivariate analysis, including clinical stage and pathological grade. Most MDI targets individually are significant markers of ER(+) cancer survival. Lymphoid and mesenchymal indexes were not substantially different between ER(+) and ER(-) groups and do not explain MDI dichotomy. However, the mesenchymal index was associated with ER(+) cancer survival, and a high lymphoid index was associated with medullary carcinoma. Finally, a comparison between metastases and primary tumors suggests methylation patterns are established early and maintained through disease progression for both ER(+) and ER(-) tumors.

Fold up
Nasun Hah, Charles G Danko, Leighton Core, Joshua J Waterfall, Adam Siepel, John T Lis, W Lee Kraus (2011 May 10)

A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells.

Cell : 622-34 : DOI : 10.1016/j.cell.2011.03.042 Learn more
Summary

We report the immediate effects of estrogen signaling on the transcriptome of breast cancer cells using global run-on and sequencing (GRO-seq). The data were analyzed using a new bioinformatic approach that allowed us to identify transcripts directly from the GRO-seq data. We found that estrogen signaling directly regulates a strikingly large fraction of the transcriptome in a rapid, robust, and unexpectedly transient manner. In addition to protein-coding genes, estrogen regulates the distribution and activity of all three RNA polymerases and virtually every class of noncoding RNA that has been described to date. We also identified a large number of previously undetected estrogen-regulated intergenic transcripts, many of which are found proximal to estrogen receptor binding sites. Collectively, our results provide the most comprehensive measurement of the primary and immediate estrogen effects to date and a resource for understanding rapid signal-dependent transcription in other systems.

Fold up
Irene M Min, Joshua J Waterfall, Leighton J Core, Robert J Munroe, John Schimenti, John T Lis (2011 Apr 5)

Regulating RNA polymerase pausing and transcription elongation in embryonic stem cells.

Genes & development : 742-54 : DOI : 10.1101/gad.2005511 Learn more
Summary

Transitions between pluripotent stem cells and differentiated cells are executed by key transcription regulators. Comparative measurements of RNA polymerase distribution over the genome’s primary transcription units in different cell states can identify the genes and steps in the transcription cycle that are regulated during such transitions. To identify the complete transcriptional profiles of RNA polymerases with high sensitivity and resolution, as well as the critical regulated steps upon which regulatory factors act, we used genome-wide nuclear run-on (GRO-seq) to map the density and orientation of transcriptionally engaged RNA polymerases in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). In both cell types, progression of a promoter-proximal, paused RNA polymerase II (Pol II) into productive elongation is a rate-limiting step in transcription of ∼40% of mRNA-encoding genes. Importantly, quantitative comparisons between cell types reveal that transcription is controlled frequently at paused Pol II’s entry into elongation. Furthermore, “bivalent” ESC genes (exhibiting both active and repressive histone modifications) bound by Polycomb group complexes PRC1 (Polycomb-repressive complex 1) and PRC2 show dramatically reduced levels of paused Pol II at promoters relative to an average gene. In contrast, bivalent promoters bound by only PRC2 allow Pol II pausing, but it is confined to extremely 5′ proximal regions. Altogether, these findings identify rate-limiting targets for transcription regulation during cell differentiation.

Fold up

Year of publication 2008

Leighton J Core, Joshua J Waterfall, John T Lis (2008 Dec 6)

Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters.

Science (New York, N.Y.) : 1845-8 : DOI : 10.1126/science.1162228 Learn more
Summary

RNA polymerases are highly regulated molecular machines. We present a method (global run-on sequencing, GRO-seq) that maps the position, amount, and orientation of transcriptionally engaged RNA polymerases genome-wide. In this method, nuclear run-on RNA molecules are subjected to large-scale parallel sequencing and mapped to the genome. We show that peaks of promoter-proximal polymerase reside on approximately 30% of human genes, transcription extends beyond pre-messenger RNA 3′ cleavage, and antisense transcription is prevalent. Additionally, most promoters have an engaged polymerase upstream and in an orientation opposite to the annotated gene. This divergent polymerase is associated with active genes but does not elongate effectively beyond the promoter. These results imply that the interplay between polymerases and regulators over broad promoter regions dictates the orientation and efficiency of productive transcription.

Fold up
Fergal P Casey, Joshua J Waterfall, Ryan N Gutenkunst, Christopher R Myers, James P Sethna (2008 Nov 13)

Variational method for estimating the rate of convergence of Markov-chain Monte Carlo algorithms.

Physical review. E, Statistical, nonlinear, and soft matter physics : 046704 Learn more
Summary

We demonstrate the use of a variational method to determine a quantitative lower bound on the rate of convergence of Markov chain Monte Carlo (MCMC) algorithms as a function of the target density and proposal density. The bound relies on approximating the second largest eigenvalue in the spectrum of the MCMC operator using a variational principle and the approach is applicable to problems with continuous state spaces. We apply the method to one dimensional examples with Gaussian and quartic target densities, and we contrast the performance of the random walk Metropolis-Hastings algorithm with a “smart” variant that incorporates gradient information into the trial moves, a generalization of the Metropolis adjusted Langevin algorithm. We find that the variational method agrees quite closely with numerical simulations. We also see that the smart MCMC algorithm often fails to converge geometrically in the tails of the target density except in the simplest case we examine, and even then care must be taken to choose the appropriate scaling of the deterministic and random parts of the proposed moves. Again, this calls into question the utility of smart MCMC in more complex problems. Finally, we apply the same method to approximate the rate of convergence in multidimensional Gaussian problems with and without importance sampling. There we demonstrate the necessity of importance sampling for target densities which depend on variables with a wide range of scales.

Fold up

Year of publication 2007

Ryan N Gutenkunst, Fergal P Casey, Joshua J Waterfall, Christopher R Myers, James P Sethna (2007 Oct 11)

Extracting falsifiable predictions from sloppy models.

Annals of the New York Academy of Sciences : 203-11 Learn more
Summary

Successful predictions are among the most compelling validations of any model. Extracting falsifiable predictions from nonlinear multiparameter models is complicated by the fact that such models are commonly sloppy, possessing sensitivities to different parameter combinations that range over many decades. Here we discuss how sloppiness affects the sorts of data that best constrain model predictions, makes linear uncertainty approximations dangerous, and introduces computational difficulties in Monte-Carlo uncertainty analysis. We also present a useful test problem and suggest refinements to the standards by which models are communicated.

Fold up
Ryan N Gutenkunst, Joshua J Waterfall, Fergal P Casey, Kevin S Brown, Christopher R Myers, James P Sethna (2007 Oct 10)

Universally sloppy parameter sensitivities in systems biology models.

PLoS computational biology : 1871-78 Learn more
Summary

Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring in vivo biochemical parameters is difficult, and collectively fitting them to other experimental data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor-signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that the model had a “sloppy” spectrum of parameter sensitivities, with eigenvalues roughly evenly distributed over many decades. Here we use a collection of models from the literature to test whether such sloppy spectra are common in systems biology. Strikingly, we find that every model we examine has a sloppy spectrum of sensitivities. We also test several consequences of this sloppiness for building predictive models. In particular, sloppiness suggests that collective fits to even large amounts of ideal time-series data will often leave many parameters poorly constrained. Tests over our model collection are consistent with this suggestion. This difficulty with collective fits may seem to argue for direct parameter measurements, but sloppiness also implies that such measurements must be formidably precise and complete to usefully constrain many model predictions. We confirm this implication in our growth-factor-signaling model. Our results suggest that sloppy sensitivity spectra are universal in systems biology models. The prevalence of sloppiness highlights the power of collective fits and suggests that modelers should focus on predictions rather than on parameters.

Fold up
F P Casey, D Baird, Q Feng, R N Gutenkunst, J J Waterfall, C R Myers, K S Brown, R A Cerione, J P Sethna (2007 Jun 27)

Optimal experimental design in an epidermal growth factor receptor signalling and down-regulation model.

IET systems biology : 190-202 Learn more
Summary

We apply the methods of optimal experimental design to a differential equation model for epidermal growth factor receptor signalling, trafficking and down-regulation. The model incorporates the role of a recently discovered protein complex made up of the E3 ubiquitin ligase, Cbl, the guanine exchange factor (GEF), Cool-1 (beta -Pix) and the Rho family G protein Cdc42. The complex has been suggested to be important in disrupting receptor down-regulation. We demonstrate that the model interactions can accurately reproduce the experimental observations, that they can be used to make predictions with accompanying uncertainties, and that we can apply ideas of optimal experimental design to suggest new experiments that reduce the uncertainty on unmeasurable components of the system.

Fold up

Year of publication 2006

Joshua J Waterfall, Fergal P Casey, Ryan N Gutenkunst, Kevin S Brown, Christopher R Myers, Piet W Brouwer, Veit Elser, James P Sethna (2006 Dec 13)

Sloppy-model universality class and the Vandermonde matrix.

Physical review letters : 150601 Learn more
Summary

In a variety of contexts, physicists study complex, nonlinear models with many unknown or tunable parameters to explain experimental data. We explain why such systems so often are sloppy: the system behavior depends only on a few “stiff” combinations of the parameters and is unchanged as other “sloppy” parameter combinations vary by orders of magnitude. We observe that the eigenvalue spectra for the sensitivity of sloppy models have a striking, characteristic form with a density of logarithms of eigenvalues which is roughly constant over a large range. We suggest that the common features of sloppy models indicate that they may belong to a common universality class. In particular, we motivate focusing on a Vandermonde ensemble of multiparameter nonlinear models and show in one limit that they exhibit the universal features of sloppy models.

Fold up