Project Overview

project overview

The NIH Cancer Genome Atlas (TCGA) project has generated enormous data sets covering > 20 malignancies and provides many valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the association between TCGA results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. This will be an important exploratory step towards the clinical translation of cancer genomic/proteomic data. Several websites such as cBio portal or UCSC genome browser make TCGA data accessible but these sites have few interactive features for querying clinically-relevant phenotypic associations with cancer drivers. To enable exploration of the clinical associations of the TCGA data, we developed the Stanford-TCGA Portal. Using a web/mobile interface, this website enables easy navigation of cancer genomic/proteomic/clinical data provided by the TCGA and poses clinically relevant questions. For example, "What genes are associated with advanced breast cancer?" or "What is the frequency of copy number deletion of APC for samples with/without PIK3CA mutations?".

The Stanford-TCGA portal interface provides querying of TCGA data in three different ways:

  1. search for clinically relevant genes/micro RNAs (miRs)/proteins by names, cancer types or clinical parameters
  2. profile genomic/proteomic changes by clinical parameters in a cancer type
  3. test two-hit hypotheses.

Any user can easily navigate the lists of these identified genes through the Stanford-TCGA Portal. SQL queries run in the background and show the results on our portal according to user’s input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression; we relied on clinical parameters in the space of multiple genomic/proteomic features provided by the TCGA data. We identified the set of top gene predictors (by 10 fold cross validation) of each clinical parameter for each cancer. The robustness of results was estimated by bootstrapping with 2,000 iterations. As a proof-of-concept, we demonstrated this approach’s utility in the TCGA colorectal cancer data (HoJoon Lee et al., BMC Medical Genomics 2013) and we have since expanded our analysis to include nearly all TCGA malignancies. We identify clinically relevant genes/miRs/proteins by our statistical analysis, some which has been previously described, from 25 cancer types with 18 clinical parameters such as clinical stage or smoking history.

In summary, the Stanford-TCGA Portal enables the cancer research community and others to fully utilize TCGA data. With its straightforward web/mobile interface, it provides simple yet clinically relevant associations of TCGA results. Thus, one can examine queries and test hypothesis regarding genomic/proteomic alterations in cancers from any time and place.

Queries and comments: