TCGA data
ABOUT DATASETS > TCGA data
Seven Bridges is committed to providing CGC users with the most up-to-date version of the TCGA legacy dataset that is available from the NCI Genomic Data Commons (GDC). In keeping with this commitment, the CGC transitioned from hosting the CGHub version of this dataset to the GDC Legacy Archive Data Release 11.0 version on July 10, 2018. As of this date, all files accessible via the Data Browser and CGC API correspond to Data Release 11.0. As of July 12, 2018, all files accessible via the Datasets API also correspond to Data Release 11.0. Files that were added to individual projects before this date and are no longer represented in the new dataset version will no longer be accessible via those projects but may be obtainable from the GDC archive by contacting the GDC Help Desk. Similarly, files that are no longer represented in Data Release 11.0 are no longer accessible through saved Data Browser queries, and affected queries will return a result of '0'. In addition, due to a change in the way some files within this dataset are hosted, a small number of saved Data Browser queries for which the files are still available also will return a '0' result. Such queries can be recreated using the Data Browser query-building canvas and will continue to return the same results as previously. Please contact the CGC Team at [email protected] if you have any questions. The CGC Team looks forward to continuing to collaborate with the GDC in the months ahead to ensure the timely availability through the CGC of new data releases for this dataset.
The Cancer Genome Atlas (TCGA) is one of the richest and most complete genomics datasets and was compiled to understand the molecular basis of cancers. Data collection for TCGA began in 2006 as a joint effort by the National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), the National Institutes of Health (NIH), and the U.S. Department of Health and Human Services.
Over the past decade, TCGA has grown to contain data on 33 different tumor types and over 11,000 cases (patients). Between 50 and 1500 cases have been sampled for each tumor type. For each case, multiple samples were analyzed, using microarray technology for genome characterization, and next-generation technology for sequencing. TCGA data currently represents more than 2.5 petabytes of information and is expected to grow as new samples are processed.
For a full list of TCGA data available on the CGC, see the table below. The table details data types and subtypes, the data format of data subtypes, and the access level of each data subtype.
Data type | Data subtype | Data format | Data Access Tier |
---|---|---|---|
Clinical | Clinical Data | XML | Open Data |
Clinical | Biospecimen Data | XML | Open Data |
Raw Sequencing Data | Aligned Reads | BAM | Controlled Data |
Raw Sequencing Data | Unaligned Reads | TAR | Controlled Data |
Raw Sequencing Data | Sequencing Tag | DGE-Tag | Open Data |
Raw Sequencing Data | Sequencing Tag Counts | TXT | Open Data |
Raw Microarray Data | Raw Intensities | Idat, CEL, TXT, TIF | Open and Controlled Data |
Raw Microarray Data | Intensities Log2Ratio | TXT | Open Data |
Raw Microarray Data | Intensities | TXT | Open Data |
Raw Microarray Data | Normalized Intensities | TXT, Dat | Open and Controlled Data |
Simple Nucleotide Variation | Genotypes | TXT, Dat | Controlled Data |
Simple Nucleotide Variation | Simple Somatic Mutation | MAF | Open and Controlled Data |
Simple Nucleotide Variation | Simple Nucleotide Variation | VCF | Controlled Data |
Gene Expression | Gene Expression Quantification | TXT | Open Data |
Gene Expression | miRNA Quantification | TXT | Open Data |
Gene Expression | Isoform Expression Quantification | TXT | Open Data |
Gene Expression | Exon Junction Quantification | TXT | Open Data |
Gene Expression | Exon Quantification | TXT | Open Data |
Structural Rearrangement | Structural Variation | VCF, FA | Controlled Data |
DNA Methylation | Bisulfite Sequence Alignment | VCF | Controlled Data |
DNA Methylation | Methylation Beta Value | TXT | Open Data |
DNA Methylation | Methylation Percentage | BED | Open Data |
Copy Number Variation | Copy Number Segmentation | TXT, Dat | Open Data |
Copy Number Variation | Copy Number Estimate | TXT | Controlled Data |
Copy Number Variation | LOH | TXT | Open Data |
Copy Number Variation | Copy Number Variation | VCF | Controlled Data |
Copy Number Variation | Normalized Copy Numbers | TXT | Controlled Data |
Protein Expression | Protein Expression Quantification | TXT | Open Data |
Other | Microsatellite Instability | FSA, TXT | Controlled Data |
Raw microarray data | CGH array QC | PNG | Open Data |
Other | ABI sequence trace | TR | Controlled data |
Raw microarray data | CGH array QC | JPG | Open data |
Raw sequencing data | Unaligned reads | FASTQ | Controlled data |
Clinical | Clinical Data Biospecimen Data | Biotab | Open data |
Raw microarray data | CGH array QC | TSV | Open data |
Raw sequencing data | Coverage WIG | WIG | Open and controlled data |
Clinical Raw microarray data | Pathology report CGH array QC | Open data | |
Clinical | Tissue slide image Diagnostic image | SVS | Open data |
Biospecimen Clinical | Biospecimen Supplement Clinical Supplement | BCR XML | Open data |
Updated less than a minute ago