CGC mutation data not matching downloadable maf

Posted in TCGA data on the CGC by An Loehr Fri May 06 2016 21:02:27 GMT+0000 (UTC)·4·Viewed 724 times

Dear CGC team, When I compare the mutation data I download from the TCGA website (maf files), they don't match up with the data displayed in CGC. For example, in ovarian cancer 8% and 9% of patients are BRCA1 and BRCA2 mutated. When I pull these data up in the browser, there are only 2 BRCA2 mutations showing up. The same pattern is true for TCGA LUSC and BLCA, with CGC displaying fewer mutations than shown in the raw data. Can you please explain this discrepancy? Thanks, An
An Loehr
May 10, 2016

For example, the OV data set shows 1 BRCA frame shift insertion in CGC:
67f526f1-def5-43b4-bc89-154baae190fc TCGA-24-1846 BRCA1 Frame_Shift_Ins
in the Somatic_Mutations files there are 3 samples with frame shift insertions, which are all within the data set displayed in CGC, but two are not marked as being mutated:
7248cd60-be22-44bc-bc58-f644db0940a2 TCGA-13-1489 BRCA1 Frame_Shift_Ins
67f526f1-def5-43b4-bc89-154baae190fc TCGA-24-1846 BRCA1 Frame_Shift_Ins (shown in CGC)
c435627c-159d-4a6d-a819-30abac24bf4d TCGA-25-1632 BRCA1 Frame_Shift_Ins

Can you please explain this discrepancy?

Thanks,
An

Erik Lehnert
May 27, 2016

Hi An,

Each disease has had multiple MAF files associated with it over time, depending on several factors (e.g., how variants were called, whether the MAF files were curated or uncurated, etc...) To produce our database, we selected MAF files for each disease according to specific criteria. Since the time we built this dataset, new MAFs have been generated, which may be the ones you are looking at. The GDC has also recently released a recommended set of MAF files to use with each disease. We will be working to update our database to these community standards in the near future.

To assist you in your analysis, please find a table listing each disease and a link to its associated MAF file. Please let us know if you have any further questions!

Disease MAF link
ACC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/acc/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_ACC.IlluminaGA_DNASeq_curated.Level_2.1.0.0/An_TCGA_ACC_External_capture_All_Pairs.aggregated.capture.tcga.uuid.curated.somatic.maf
BLCA https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/blca/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_BLCA.IlluminaGA_DNASeq_curated.Level_2.1.4.0/BLCA130_somatic_updated.aggregated.capture.tcga.uuid.curated.somatic.maf
BRCA https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/brca/gsc/genome.wustl.edu/illuminaga_dnaseq_curated/mutations/genome.wustl.edu_BRCA.IlluminaGA_DNASeq_curated.Level_2.1.1.0/genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.1.1.0.curated.somatic.maf
CESC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/cesc/gsc/genome.wustl.edu/illuminaga_dnaseq_curated/mutations/genome.wustl.edu_CESC.IlluminaGA_DNASeq_curated.Level_2.1.0.0/genome.wustl.edu_CESC.IlluminaGA_DNASeq_curated.Level_2.1.0.0.somatic.maf
CHOL https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/chol/gsc/hgsc.bcm.edu/mixed_dnaseq_curated/mutations/hgsc.bcm.edu_CHOL.Mixed_DNASeq_curated.Level_2.1.0.0/hgsc.bcm.edu_CHOL.IlluminaGA_DNASeq.1.somatic.maf
COAD https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/coad/gsc/hgsc.bcm.edu/illuminaga_dnaseq/mutations/hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0/hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf
ESCA https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/esca/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0/hgsc.bcm.edu_ESCA.IlluminaGA_DNASeq.1.somatic.maf
GBM https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/gbm/gsc/ucsc.edu/illuminaga_dnaseq_automated/mutations/ucsc.edu_GBM.IlluminaGA_DNASeq_automated.Level_2.1.1.0/ucsc.edu_GBM.IlluminaGA_DNASeq_automated.Level_2.1.1.0.somatic.maf
HNSC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/hnsc/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_HNSC.IlluminaGA_DNASeq_curated.Level_2.1.6.0/pair_set_279_freeze_Mar262013.aggregated.capture.tcga.uuid.curated.somatic.maf
KIRC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/kirc/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_KIRC.IlluminaGA_DNASeq.Level_2.1.5.0/BI_and_BCM_1.4.aggregated.tcga.somatic.maf
KIRP https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/kirp/gsc/hgsc.bcm.edu/illuminaga_dnaseq_curated/mutations/hgsc.bcm.edu_KIRP.IlluminaGA_DNASeq_curated.Level_2.1.0.0/hgsc.bcm.edu_KIRP.IlluminaGA_DNASeq.1.somatic.maf
LAML https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/laml/gsc/genome.wustl.edu/illuminahiseq_dnaseq_automated/mutations/genome.wustl.edu_LAML.IlluminaHiSeq_DNASeq_automated.Level_2.1.1.0/genome.wustl.edu_LAML.IlluminaHiSeq_DNASeq_automated.1.1.0.somatic.maf
LGG https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/lgg/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_LGG.IlluminaGA_DNASeq_curated.Level_2.1.4.0/LGG_FINAL_ANALYSIS.aggregated.capture.tcga.uuid.curated.somatic.maf
LIHC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/lihc/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_LIHC.IlluminaGA_DNASeq_automated.Level_2.1.1.0/hgsc.bcm.edu_LIHC.IlluminaGA_DNASeq.1.somatic.maf
LUAD https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/luad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_LUAD.IlluminaGA_DNASeq_curated.Level_2.1.6.0/AN_TCGA_LUAD_PAIR_capture_freeze_FINAL_230.aggregated.capture.tcga.uuid.curated.somatic.maf
LUSC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/lusc/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_LUSC.IlluminaGA_DNASeq.Level_2.100.1.0/step4_LUSC_Paper_v8.aggregated.tcga.maf2.4.migrated.somatic.maf
MESO https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/meso/gsc/broad.mit.edu/illuminaga_dnaseq_automated/mutations/broad.mit.edu_MESO.IlluminaGA_DNASeq_automated.Level_2.1.0.0/MESO_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf
OV https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/ov/gsc/genome.wustl.edu/illuminaga_dnaseq/mutations/genome.wustl.edu_OV.IlluminaGA_DNASeq.Level_2.2.1.0/genome.wustl.edu_OV.IlluminaGA_DNASeq.Level_2.2.0.0.somatic.maf
PAAD https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/paad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_PAAD.IlluminaGA_DNASeq_curated.Level_2.1.3.0/freeze2.aggregated.capture.tcga.uuid.curated.somatic.maf
PCPG https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/pcpg/gsc/broad.mit.edu/illuminaga_dnaseq_automated/mutations/broad.mit.edu_PCPG.IlluminaGA_DNASeq_automated.Level_2.1.2.0/PR_TCGA_PCPG_PAIR_Capture_All_Pairs_QCPASS_v1.aggregated.capture.tcga.uuid.automated.somatic.maf
PRAD https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/prad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_PRAD.IlluminaGA_DNASeq_curated.Level_2.1.4.0/PR_TCGA_PRAD_PAIR_Capture_All_Pairs_QCPASS_v4.aggregated.capture.tcga.uuid.curated.somatic.maf
READ https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/read/gsc/hgsc.bcm.edu/illuminaga_dnaseq/mutations/hgsc.bcm.edu_READ.IlluminaGA_DNASeq.Level_2.1.6.0/hgsc.bcm.edu_READ.IlluminaGA_DNASeq.1.somatic.maf
SARC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/sarc/gsc/genome.wustl.edu/illuminahiseq_dnaseq_automated/mutations/genome.wustl.edu_SARC.IlluminaHiSeq_DNASeq_automated.Level_2.1.1.0/genome.wustl.edu_SARC.IlluminaHiSeq_DNASeq_automated.1.1.0.somatic.maf
SKCM https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/skcm/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_SKCM.IlluminaGA_DNASeq.Level_2.1.5.0/skcm_clean_pairs.aggregated.capture.tcga.uuid.somatic.maf
STAD https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/stad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_STAD.IlluminaGA_DNASeq_curated.Level_2.1.3.0/QCv5_blacklist_Pass.aggregated.capture.tcga.uuid.curated.somatic.maf
TGCT https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/tgct/gsc/broad.mit.edu/illuminaga_dnaseq_automated/mutations/broad.mit.edu_TGCT.IlluminaGA_DNASeq_automated.Level_2.1.0.0/TGCT_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf
THCA https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_THCA.IlluminaGA_DNASeq.Level_2.1.5.0/AN_TCGA_THCA_PAIR_Capture_ALLQC_14Aug2013_429.aggregated.capture.tcga.uuid.somatic.maf
THYM https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thym/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_THYM.IlluminaGA_DNASeq_automated.Level_2.1.0.0/hgsc.bcm.edu_THYM.IlluminaGA_DNASeq.1.somatic.maf
UCEC https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/ucec/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_UCEC.IlluminaGA_DNASeq.Level_2.100.1.0/step4_An_UCEC_194.aggregated.tcga.maf2.4.migrated.somatic.maf
UVM https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/uvm/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_UVM.IlluminaGA_DNASeq_automated.Level_2.1.0.0/hgsc.bcm.edu_UVM.IlluminaGA_DNASeq.1.somatic.maf
An Loehr
June 30, 2016

Thank you for the explanation, Erik, but it still leaves open questions. For example, the ovarian data set published in 2011 that the somatic BRCA1 mutation rate is 9% and the somatic BRCA2 mutation rate is 8%. Again - CGC displays 2 BRCA2 mutations in a data set of >300 samples. There should at the very least be a disclaimer that the mutation data are incomplete, because what is being displayed is misleading at the best.

Nilesh Tawari
July 29, 2016

The suggested MAFs are not available now? What is alternate location?

  
Markdown is allowed