Early Detection Research Network
Team Project

Biomarkers to Distinguish Aggressive Cancers from Non-aggressive or Non-progressing Cancer

Christopher Li Supplement 2012
No design specified.
[u'Breast and Gynecologic Cancers Research Group']

Distinguishing aggressive cancers from non-aggressive or non-progressing cancers is an issue of both clinical and public health importance particularly for those cancers with an available screening test. With respect to breast cancer, mammographic screening has been shown in randomized trials to reduce breast cancer mortality, but given the limitations of its sensitivity and specificity some breast cancers are missed by screening. These so called interval detected breast cancers diagnosed between regular screenings are known to have a more aggressive clinical profile. In addition, of those cancers detected by mammography some are indolent while others are more likely to recur despite treatment.

screen detected breast cancer focusing on early stage invasive disease. We will compare gene expression profiles using the whole genome-cDNA-mediated Annealing, Selection, extension and Ligation (DASL) assay of 50 screen detected cancers to those of 50 interval detected cancers. Through this approach we will advance our understanding of the molecular characteristics of interval vs. screen detected breast cancers and discover novel biomarkers that distinguish between them. Aim 2: To identify biomarkers in tumor tissue related to risk of cancer recurrence among patients with screen detected early stage invasive breast cancer. Using the DASL assay we will compare gene expression profiles from screen detected early stage breast cancer that either recurred within five years or never recurred within five years. These two groups of patients will be matched on multiple factors including tumor stage and treatments received. Our goal with this comparison is to identify novel biomarkers that discriminate between tumors that recur and are more aggressive compared to those that are less aggressive and do not recur.
Gene expression data will be pre-processed, normalized and cleaned as described in our protocol (Appendix 1). We will perform two levels of analysis: gene-set level and gene level, to identify gene sets and genes that are associated with interval vs. screen detected disease and those that are associated with recurrent vs. non-recurrent disease. In general, we will account for multiple testing by controlling the false discovery rate (FDR).23 We will use a 5% FDR when assessing statistical significance across all samples, and a 1% FDR (or lower) when performing subgroup analyses. For our gene-level analysis, we will use linear regression for each gene to identify genes showing differential expression in our comparisons of interest. Matching variables are adjusted as covariates in the linear regression models. For gene set-level analyses, we will first rank genes from high to low based on their association in each comparison, then for each gene set we will calculate an enrichment score that reflects how much the gene set is represented with genes that differentiate between our comparison groups (http://www.broad.mit.edu/GSEA).24,25 The statistical significance of enrichment scores will be evaluated by calculating enrichment scores relative to each of the null distributions formed by: 1) permuting exposure status within each matched set and 2) permuting genes. Using both types of null distributions gives us gene sets associated with a given comparison as well as those particularly enriched with associated genes. Performing the analysis in two tiers will help us to identify not only the genes that are individually most likely to differentiate our comparison groups, but also those that may have only moderate effect individually but collectively as a gene set may strongly discriminate between our comparison groups. This will enhance the power to detect all the associated genes or gene sets. We will construct a panel of genes that discriminate between our comparison groups based on significantly associated genes or gene sets using regularization techniques, which have been shown to improve prediction and interpretation considerably compared to ordinary regression models without regularization. Specifically, we will use the elastic net26 regularization method as it has a desirable feature well suited to our data, i.e., encouraging genes in the same pathway to be selected as a group in the model.27 10- fold cross validation will be used to determine the appropriate amount of regularization, and hence the panel of the biomarkers that are associated with the outcome. While microarray technology allows simultaneous evaluation of expression levels of thousands of genes, only a fraction is expected to be associated with the exposures of interest. With respect to statistical power, the minimum detectable effect size (MDES) is determined by the number (m) of truly altered genes out of the total p (here 25,000) genes studied, in additio

There are currently no biomarkers annotated for this protocol.

No datasets are currently associated with this protocol.