TCGA data

ABOUT DATASETS > TCGA data

📘

Seven Bridges is committed to providing CGC users with the most up-to-date version of the TCGA legacy dataset that is available from the NCI Genomic Data Commons (GDC). In keeping with this commitment, the CGC transitioned from hosting the CGHub version of this dataset to the GDC Legacy Archive Data Release 11.0 version on July 10, 2018. As of this date, all files accessible via the Data Browser and CGC API correspond to Data Release 11.0. As of July 12, 2018, all files accessible via the Datasets API also correspond to Data Release 11.0. Files that were added to individual projects before this date and are no longer represented in the new dataset version will no longer be accessible via those projects but may be obtainable from the GDC archive by contacting the GDC Help Desk. Similarly, files that are no longer represented in Data Release 11.0 are no longer accessible through saved Data Browser queries, and affected queries will return a result of '0'. In addition, due to a change in the way some files within this dataset are hosted, a small number of saved Data Browser queries for which the files are still available also will return a '0' result. Such queries can be recreated using the Data Browser query-building canvas and will continue to return the same results as previously. Please contact the CGC Team at [email protected] if you have any questions. The CGC Team looks forward to continuing to collaborate with the GDC in the months ahead to ensure the timely availability through the CGC of new data releases for this dataset.

The Cancer Genome Atlas (TCGA) is one of the richest and most complete genomics datasets and was compiled to understand the molecular basis of cancers. Data collection for TCGA began in 2006 as a joint effort by the National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), the National Institutes of Health (NIH), and the U.S. Department of Health and Human Services.

Over the past decade, TCGA has grown to contain data on 33 different tumor types and over 11,000 cases (patients). Between 50 and 1500 cases have been sampled for each tumor type. For each case, multiple samples were analyzed, using microarray technology for genome characterization, and next-generation technology for sequencing. TCGA data currently represents more than 2.5 petabytes of information and is expected to grow as new samples are processed.

For a full list of TCGA data available on the CGC, see the table below. The table details data types and subtypes, the data format of data subtypes, and the access level of each data subtype.

Data typeData subtypeData formatData Access Tier
ClinicalClinical DataXMLOpen Data
ClinicalBiospecimen DataXMLOpen Data
Raw Sequencing DataAligned ReadsBAMControlled Data
Raw Sequencing DataUnaligned ReadsTARControlled Data
Raw Sequencing DataSequencing TagDGE-TagOpen Data
Raw Sequencing DataSequencing Tag CountsTXTOpen Data
Raw Microarray DataRaw IntensitiesIdat, CEL, TXT, TIFOpen and Controlled Data
Raw Microarray DataIntensities Log2RatioTXTOpen Data
Raw Microarray DataIntensitiesTXTOpen Data
Raw Microarray DataNormalized IntensitiesTXT, DatOpen and Controlled Data
Simple Nucleotide VariationGenotypesTXT, DatControlled Data
Simple Nucleotide VariationSimple Somatic MutationMAFOpen and Controlled Data
Simple Nucleotide VariationSimple Nucleotide VariationVCFControlled Data
Gene ExpressionGene Expression QuantificationTXTOpen Data
Gene ExpressionmiRNA QuantificationTXTOpen Data
Gene ExpressionIsoform Expression QuantificationTXTOpen Data
Gene ExpressionExon Junction QuantificationTXTOpen Data
Gene ExpressionExon QuantificationTXTOpen Data
Structural RearrangementStructural VariationVCF, FAControlled Data
DNA MethylationBisulfite Sequence AlignmentVCFControlled Data
DNA MethylationMethylation Beta ValueTXTOpen Data
DNA MethylationMethylation PercentageBEDOpen Data
Copy Number VariationCopy Number SegmentationTXT, DatOpen Data
Copy Number VariationCopy Number EstimateTXTControlled Data
Copy Number VariationLOHTXTOpen Data
Copy Number VariationCopy Number VariationVCFControlled Data
Copy Number VariationNormalized Copy NumbersTXTControlled Data
Protein ExpressionProtein Expression QuantificationTXTOpen Data
OtherMicrosatellite InstabilityFSA, TXTControlled Data
Raw microarray dataCGH array QCPNGOpen Data
OtherABI sequence traceTRControlled data
Raw microarray dataCGH array QCJPGOpen data
Raw sequencing dataUnaligned readsFASTQControlled data
ClinicalClinical Data
Biospecimen Data
BiotabOpen data
Raw microarray dataCGH array QCTSVOpen data
Raw sequencing dataCoverage WIGWIGOpen and controlled data
Clinical
Raw microarray data
Pathology report
CGH array QC
PDFOpen data
ClinicalTissue slide image
Diagnostic image
SVSOpen data
Biospecimen
Clinical
Biospecimen Supplement
Clinical Supplement
BCR XMLOpen data