Application of Genomic and Proteomic Technologies in Biomarker Discovery
by Elana Fertig
EJ Fertig, R Slebos, and CH Chung. Application of Genomic and Proteomic Technologies in Biomarker Discovery. In: Govindan R, ed. 2012 ASCO Educational Book. Alexandria, VA: American Society of Clinical Oncology; 2012;377-382.
Overview: Sequencing of the human genome was completed in 2001. Building on the technology and experience of... more Overview: Sequencing of the human genome was completed in 2001. Building on the technology and experience of whole-exome sequencing, numerous cancer genomes have been sequenced, including head and neck squamous cell carcinoma (HNSCC) in 2011. Although DNA sequencing data reveals a complex genome with numerous mutations, the biologic interaction and clinical significance of the overall genetic aberrations are largely unknown. Comprehensive analyses of the tumors using genomics and proteomics beyond sequencing data can potentially accelerate the rate and number of biomarker discoveries to improve biology-driven classification of tumors for prognosis and patient selection for a specific therapy. In this review, we will summarize the current genomic and proteomic technologies, general biomarker-discovery paradigms using the technology and published data in HNSCC---including potential clinical applications and limitations.
9 views
Seen by:Gene expression signatures modulated by epidermal growth factor receptor activation and their relationship to cetuximab resistance in head and neck squamous cell carcinoma
by Elana Fertig
Elana J Fertig, Qing Ren, Haixia Cheng, Hiromitsu Hatakeyama, Adam Dicker, Ulrich Rodeck, Michael Considine, Michael F Ochs and Christine H Chung (2012) BMC Genomics 13:160. Featured in http://www.biomedcentral.com/1741-7015/10/43/abstract.
Background: Aberrant activation of signaling pathways downstream of epidermal growth factor receptor (EGFR) has been... more
Background: Aberrant activation of signaling pathways downstream of epidermal growth factor receptor (EGFR) has been hypothesized to be one of the mechanisms of cetuximab (a monoclonal antibody against EGFR) resistance in head and neck squamous cell carcinoma (HNSCC). To infer relevant and specific pathway activation downstream of EGFR from gene expression in HNSCC, we generated gene expression signatures using immortalized keratinocytes (HaCaT) subjected to ligand stimulation and transfected with EGFR, RELA/p65, or HRASVal12D.
Results: The gene expression patterns that distinguished the HaCaT variants and conditions were inferred using the Markov chain Monte Carlo (MCMC) matrix factorization algorithm Coordinated Gene Activity in Pattern Sets (CoGAPS). This approach inferred gene expression signatures with greater relevance to cell signaling pathway activation than the expression signatures inferred with standard linear models. Furthermore, the pathway signature generated using HaCaT-HRASVal12D further associated with the cetuximab treatment response in isogenic cetuximab-sensitive (UMSCC1) and -resistant (1CC8) cell lines.
Conclusions: Our data suggest that the CoGAPS algorithm can generate gene expression signatures that are pertinent to downstream effects of receptor signaling pathway activation and potentially be useful in modeling resistance mechanisms to targeted therapies.
A Novel Dynamic Impact Approach (DIA) for Functional Analysis of Time-Course Omics Studies: Validation Using the Bovine Mammary Transcriptome
PLoS ONE, 2012
The overrepresented approach (ORA) is the most widely-accepted method for functional analysis of microarray datasets.... more The overrepresented approach (ORA) is the most widely-accepted method for functional analysis of microarray datasets. The ORA is computationally-efficient and robust; however, it suffers from the inability of comparing results from multiple gene lists particularly with time-course experiments or those involving multiple treatments. To overcome such limitation a novel method termed Dynamic Impact Approach (DIA) is proposed. The DIA provides an estimate of the biological impact of the experimental conditions and the direction of the impact. The impact is obtained by combining the proportion of differentially expressed genes (DEG) with the log2 mean fold change and mean –log P-value of genes associated with the biological term. The direction of the impact is calculated as the difference of the impact of up-regulated DEG and down-regulated DEG associated with the biological term. The DIA was validated using microarray data from a time-course experiment of bovine mammary gland across the lactation cycle. Several annotation databases were analyzed with DIA and compared to the same analysis performed by the ORA. The DIA highlighted that during lactation both BTA6 and BTA14 were the most impacted chromosomes; among Uniprot tissues those related with lactating mammary gland were the most positively-impacted; within KEGG pathways ‘Galactose metabolism’ and several metabolism categories related to lipid synthesis were among the most impacted and induced; within Gene Ontology “lactose biosynthesis” among Biological processes and “Lactose synthase activity” and “Stearoyl-CoA 9-desaturase activity” among Molecular processes were the most impacted and induced. With the exception of the terms ‘Milk’, ‘Milk protein’ and ‘Mammary gland’ among Uniprot tissues and SP_PIR_Keyword, the use of ORA failed to capture as significantly-enriched (i.e., biologically relevant) any term known to be associated with lactating mammary gland. Results indicate the DIA is a biologically-sound approach for analysis of time-course experiments. This tool represents an alternative to ORA for functional analysis.
5 views
Seen by:Multifactorial Experimental Design and the Transitivity of Ratios With Spotted DNA Microarrays
Background
Multifactorial experimental designs using DNA microarrays are becoming increasingly common, but... more
Background
Multifactorial experimental designs using DNA microarrays are becoming increasingly common, but the extent of the transitivity of cDNA microarray expression measurements across multiple samples has yet to be explored.
Results
A strong correlation between direct and transitive inference for significantly differentially expressed genes is demonstrated, using subsets of a dye-swap loop design.
Conclusions
In experimental design, opportunities for transitive inference should be exploited, while always ensuring that comparisons of greatest interest comprise direct hybridizations.
Designing Experiments Using Spotted Microarrays to Detect Gene Regulation Differences Within and Among Species
Comparative studies of genome-wide gene expression must account for variation not only among species, but also within... more Comparative studies of genome-wide gene expression must account for variation not only among species, but also within species. Such studies are necessarily large in scale, because they incorporate experiments on multiple individuals of multiple species in multiple developmental stages in multiple environmental conditions. If the experiments are carefully designed and performed, the data they provide are worth the effort. We describe the utility of spotted microarrays for these studies and highlight experimental design criteria that will maximize inferential and statistical power. We conclude with a discussion of experimental protocols that are designed for investigations of differential gene expression and their pitfalls.
Resolution of Large and Small Differences In Gene Expression Using Models for the Bayesian Analysis of Gene Expression Levels and Spotted DNA Microarrays
Background
The detection of small yet statistically significant differences in gene expression in spotted... more
Background
The detection of small yet statistically significant differences in gene expression in spotted DNA microarray studies is an ongoing challenge. Meeting this challenge requires careful examination of the performance of a range of statistical models, as well as empirical examination of the effect of replication on the power to resolve these differences.
Results
New models are derived and software is developed for the analysis of microarray ratio data. These models incorporate multiplicative small error terms, and error standard deviations that are proportional to expression level. The fastest and most powerful method incorporates additive small error terms and error standard deviations proportional to expression level. Data from four studies are profiled for the degree to which they reveal statistically significant differences in gene expression. The gene expression level at which there is an empirical 50% probability of a significant call is presented as a summary statistic for the power to detect small differences in gene expression.
Conclusions
Understanding the resolution of difference in gene expression that is detectable as significant is a vital component of experimental design and evaluation. These small differences in gene expression level are readily detected with a Bayesian analysis of gene expression level that has additive error terms and constrains samples to have a common error coefficient of variation. The power to detect small differences in a study may then be determined by logistic regression.
Multi-targeted priming for genome-wide gene expression assays
Background
Complementary approaches to assaying global gene expression are needed to assess gene expression... more
Background
Complementary approaches to assaying global gene expression are needed to assess gene expression in regions that are poorly assayed by current methodologies. A key component of nearly all gene expression assays is the reverse transcription of transcribed sequences that has traditionally been performed by priming the poly-A tails on many of the transcribed genes in eukaryotes with oligo-dT, or by priming RNA indiscriminately with random hexamers. We designed an algorithm to find common sequence motifs that were present within most protein-coding genes of Saccharomyces cerevisiae and of Neurospora crassa, but that were not present within their ribosomal RNA or transfer RNA genes. We then experimentally tested whether degenerately priming these motifs with multi-targeted primers improved the accuracy and completeness of transcriptomic assays.
Results
We discovered two multi-targeted primers that would prime a preponderance of genes in the genomes of Saccharomyces cerevisiae and Neurospora crassa while avoiding priming ribosomal RNA or transfer RNA. Examining the response of Saccharomyces cerevisiae to nitrogen deficiency and profiling Neurospora crassa early sexual development, we demonstrated that using multi-targeted primers in reverse transcription led to superior performance of microarray profiling and next-generation RNA tag sequencing. Priming with multi-targeted primers in addition to oligo-dT resulted in higher sensitivity, a larger number of well-measured genes and greater power to detect differences in gene expression.
Conclusions
Our results provide the most complete and detailed expression profiles of the yeast nitrogen starvation response and N. crassa early sexual development to date. Furthermore, our multi-targeting priming methodology for genome-wide gene expression assays provides selective targeting of multiple sequences and counter-selection against undesirable sequences, facilitating a more complete and precise assay of the transcribed sequences within the genome.
14 views
Statistical Bias and Variance of Gene Selection and Cross Validation Methods: A Case Study on Hypertension Prediction
Co-authored with Olcay Kursun, Ahmet Sertbas, Nizamettin Aydin, Huseyin Seker
2012
In exploratory association studies of genes with certain diseases, a single or a small number of genes (features)... more In exploratory association studies of genes with certain diseases, a single or a small number of genes (features) related with the diseases are selected among many thousands investigated. We investigate the statistical bias and variance of simple yet common (correlation and mutual information based) feature selection algorithms using well-known cross-validation methods (leave-one-out and k-fold) on a gene finding study for hypertension prediction. Our findings show that selected genes are different for different methods and different cross-validation runs for both single gene selection and gene subset selection.
Cancer Systems Biology
by Elana Fertig
Springer Handbooks of Computational Statistics, 2011, Part 3, 533-565, DOI: 10.1007/978-3-642-16345-6_25
Cancer is a complex disease, resulting from system-wide interactions of biological processes rather than from any... more Cancer is a complex disease, resulting from system-wide interactions of biological processes rather than from any single underlying cause. The processes that drive all cancer development and progression have been termed the ‘hallmarks of cancer’. With the growth of large-scale measurements of numerous molecular and cellular properties, a new approach, cancer systems biology, to understanding the interrelationship between the hallmarks is presently being developed. Cancer systems biology focuses on systems-level analysis and presently strives to develop novel data integration and analysis techniques to model and infer cancer biology and treatment response.
CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data
by Elana Fertig
EJ Fertig, J Ding, AV Favorov, G Parmigiani, and MF Ochs. (2010) Bioinformatics, 26 (21): 2792-2793.
Summary: Coordinated Gene Activity in Pattern Sets (CoGAPS) provides an integrated package for isolating gene... more
Summary: Coordinated Gene Activity in Pattern Sets (CoGAPS) provides an integrated package for isolating gene expression driven by a biological process, enhancing inference of biological processes from transcriptomic data. CoGAPS improves on other enrichment measurement methods by combining a Markov chain Monte Carlo (MCMC) matrix factorization algorithm (GAPS) with a threshold-independent statistic inferring activity on gene sets. The software is provided as open source C++ code built on top of JAGS software with an R interface.
Availability: The R package CoGAPS and the C++ package GAPS-JAGS are provided open source under the GNU Lesser Public License (GLPL) with a users manual containing installation and operating instructions. CoGAPS is available through Bioconductor and depends on the rjags package available through CRAN to interface CoGAPS with GAPS-JAGS.
URL: http://www.cancerbiostats.onc.jhmi.edu/cogaps.cfm

