The Cancer Genome Atlas (TCGA) Data: Why Should You Use It and How?

By: Nurul-Syakima Ab Mutalib (syakima@ppukm.ukm.edu.my)

drsyakimaTCGA is a public funded project that aims to catalogue and discover genomic alterations in cancer to create a comprehensive “atlas” of cancer genomic profiles. The project engaged scientists from NIH’s National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI) and institutions across the USA and Europe. During phase I of the project, it profiled three cancers commonly associated with poor prognosis, i.e brain, lung, and ovarian cancers. Following the success of phase I, phase II analyses was expanded to include an additional 30 cancers. Each of these studies are complete with clinical data together with genome, transcriptome, methylome and proteome profiles of more than 200 patients for each cancer. Characterizations was genome-wide and unbiased.

To date, there are 33 landmark TCGA papers and > 800 publications using TCGA data either as clinical validation or as discovery cohorts. When and how shall we use TCGA data in our research? Below are some examples:

  1. Determination of clinical relevance for newly-identified molecular mechanisms of cancer.
  2. Validation of biomarkers in larger cohorts.
  3. Correlation of genes/pathways with differentially expressed genes.
  4. Molecular subtype searching for hypothesis testing.
  5. Correlation of clinical features with molecular profiles for certain cancers.
  6. Data mining for grant application.

TCGA data can be downloaded from https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp

User guidelines can be obtained from https://tcga-data.nci.nih.gov/tcga/tcgaHelp.jsp

Users of TCGA data who are unfamiliar with bioinformatics will find the cBio Cancer Genomics Portal (http://cbioportal.org) useful.