CGC mutation data not matching downloadable maf

Posted in TCGA data on the CGC by An Loehr Fri May 06 2016 21:02:27 GMT+0000 (UTC)·4·Viewed 653 times

Dear CGC team, When I compare the mutation data I download from the TCGA website (maf files), they don't match up with the data displayed in CGC. For example, in ovarian cancer 8% and 9% of patients are BRCA1 and BRCA2 mutated. When I pull these data up in the browser, there are only 2 BRCA2 mutations showing up. The same pattern is true for TCGA LUSC and BLCA, with CGC displaying fewer mutations than shown in the raw data. Can you please explain this discrepancy? Thanks, An
An Loehr
May 10, 2016
For example, the OV data set shows 1 BRCA frame shift insertion in CGC: 67f526f1-def5-43b4-bc89-154baae190fc TCGA-24-1846 BRCA1 Frame_Shift_Ins in the Somatic_Mutations files there are 3 samples with frame shift insertions, which are all within the data set displayed in CGC, but two are not marked as being mutated: 7248cd60-be22-44bc-bc58-f644db0940a2 TCGA-13-1489 BRCA1 Frame_Shift_Ins 67f526f1-def5-43b4-bc89-154baae190fc TCGA-24-1846 BRCA1 Frame_Shift_Ins (shown in CGC) c435627c-159d-4a6d-a819-30abac24bf4d TCGA-25-1632 BRCA1 Frame_Shift_Ins Can you please explain this discrepancy? Thanks, An
Erik Lehnert
May 27, 2016
Hi An, Each disease has had multiple MAF files associated with it over time, depending on several factors (e.g., how variants were called, whether the MAF files were curated or uncurated, etc...) To produce our database, we selected MAF files for each disease according to specific criteria. Since the time we built this dataset, new MAFs have been generated, which may be the ones you are looking at. The GDC has also recently released a recommended set of MAF files to use with each disease. We will be working to update our database to these community standards in the near future. To assist you in your analysis, please find a table listing each disease and a link to its associated MAF file. Please let us know if you have any further questions! | | | |---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Disease | MAF link | | ACC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/acc/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_ACC.IlluminaGA_DNASeq_curated.Level_2.1.0.0/An_TCGA_ACC_External_capture_All_Pairs.aggregated.capture.tcga.uuid.curated.somatic.maf | | BLCA | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/blca/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_BLCA.IlluminaGA_DNASeq_curated.Level_2.1.4.0/BLCA130_somatic_updated.aggregated.capture.tcga.uuid.curated.somatic.maf | | BRCA | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/brca/gsc/genome.wustl.edu/illuminaga_dnaseq_curated/mutations/genome.wustl.edu_BRCA.IlluminaGA_DNASeq_curated.Level_2.1.1.0/genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.1.1.0.curated.somatic.maf | | CESC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/cesc/gsc/genome.wustl.edu/illuminaga_dnaseq_curated/mutations/genome.wustl.edu_CESC.IlluminaGA_DNASeq_curated.Level_2.1.0.0/genome.wustl.edu_CESC.IlluminaGA_DNASeq_curated.Level_2.1.0.0.somatic.maf | | CHOL | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/chol/gsc/hgsc.bcm.edu/mixed_dnaseq_curated/mutations/hgsc.bcm.edu_CHOL.Mixed_DNASeq_curated.Level_2.1.0.0/hgsc.bcm.edu_CHOL.IlluminaGA_DNASeq.1.somatic.maf | | COAD | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/coad/gsc/hgsc.bcm.edu/illuminaga_dnaseq/mutations/hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0/hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | | ESCA | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/esca/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0/hgsc.bcm.edu_ESCA.IlluminaGA_DNASeq.1.somatic.maf | | GBM | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/gbm/gsc/ucsc.edu/illuminaga_dnaseq_automated/mutations/ucsc.edu_GBM.IlluminaGA_DNASeq_automated.Level_2.1.1.0/ucsc.edu_GBM.IlluminaGA_DNASeq_automated.Level_2.1.1.0.somatic.maf | | HNSC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/hnsc/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_HNSC.IlluminaGA_DNASeq_curated.Level_2.1.6.0/pair_set_279_freeze_Mar262013.aggregated.capture.tcga.uuid.curated.somatic.maf | | KIRC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/kirc/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_KIRC.IlluminaGA_DNASeq.Level_2.1.5.0/BI_and_BCM_1.4.aggregated.tcga.somatic.maf | | KIRP | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/kirp/gsc/hgsc.bcm.edu/illuminaga_dnaseq_curated/mutations/hgsc.bcm.edu_KIRP.IlluminaGA_DNASeq_curated.Level_2.1.0.0/hgsc.bcm.edu_KIRP.IlluminaGA_DNASeq.1.somatic.maf | | LAML | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/laml/gsc/genome.wustl.edu/illuminahiseq_dnaseq_automated/mutations/genome.wustl.edu_LAML.IlluminaHiSeq_DNASeq_automated.Level_2.1.1.0/genome.wustl.edu_LAML.IlluminaHiSeq_DNASeq_automated.1.1.0.somatic.maf | | LGG | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/lgg/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_LGG.IlluminaGA_DNASeq_curated.Level_2.1.4.0/LGG_FINAL_ANALYSIS.aggregated.capture.tcga.uuid.curated.somatic.maf | | LIHC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/lihc/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_LIHC.IlluminaGA_DNASeq_automated.Level_2.1.1.0/hgsc.bcm.edu_LIHC.IlluminaGA_DNASeq.1.somatic.maf | | LUAD | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/luad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_LUAD.IlluminaGA_DNASeq_curated.Level_2.1.6.0/AN_TCGA_LUAD_PAIR_capture_freeze_FINAL_230.aggregated.capture.tcga.uuid.curated.somatic.maf | | LUSC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/lusc/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_LUSC.IlluminaGA_DNASeq.Level_2.100.1.0/step4_LUSC_Paper_v8.aggregated.tcga.maf2.4.migrated.somatic.maf | | MESO | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/meso/gsc/broad.mit.edu/illuminaga_dnaseq_automated/mutations/broad.mit.edu_MESO.IlluminaGA_DNASeq_automated.Level_2.1.0.0/MESO_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf | | OV | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/ov/gsc/genome.wustl.edu/illuminaga_dnaseq/mutations/genome.wustl.edu_OV.IlluminaGA_DNASeq.Level_2.2.1.0/genome.wustl.edu_OV.IlluminaGA_DNASeq.Level_2.2.0.0.somatic.maf | | PAAD | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/paad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_PAAD.IlluminaGA_DNASeq_curated.Level_2.1.3.0/freeze2.aggregated.capture.tcga.uuid.curated.somatic.maf | | PCPG | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/pcpg/gsc/broad.mit.edu/illuminaga_dnaseq_automated/mutations/broad.mit.edu_PCPG.IlluminaGA_DNASeq_automated.Level_2.1.2.0/PR_TCGA_PCPG_PAIR_Capture_All_Pairs_QCPASS_v1.aggregated.capture.tcga.uuid.automated.somatic.maf | | PRAD | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/prad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_PRAD.IlluminaGA_DNASeq_curated.Level_2.1.4.0/PR_TCGA_PRAD_PAIR_Capture_All_Pairs_QCPASS_v4.aggregated.capture.tcga.uuid.curated.somatic.maf | | READ | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/read/gsc/hgsc.bcm.edu/illuminaga_dnaseq/mutations/hgsc.bcm.edu_READ.IlluminaGA_DNASeq.Level_2.1.6.0/hgsc.bcm.edu_READ.IlluminaGA_DNASeq.1.somatic.maf | | SARC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/sarc/gsc/genome.wustl.edu/illuminahiseq_dnaseq_automated/mutations/genome.wustl.edu_SARC.IlluminaHiSeq_DNASeq_automated.Level_2.1.1.0/genome.wustl.edu_SARC.IlluminaHiSeq_DNASeq_automated.1.1.0.somatic.maf | | SKCM | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/skcm/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_SKCM.IlluminaGA_DNASeq.Level_2.1.5.0/skcm_clean_pairs.aggregated.capture.tcga.uuid.somatic.maf | | STAD | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/stad/gsc/broad.mit.edu/illuminaga_dnaseq_curated/mutations/broad.mit.edu_STAD.IlluminaGA_DNASeq_curated.Level_2.1.3.0/QCv5_blacklist_Pass.aggregated.capture.tcga.uuid.curated.somatic.maf | | TGCT | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/tgct/gsc/broad.mit.edu/illuminaga_dnaseq_automated/mutations/broad.mit.edu_TGCT.IlluminaGA_DNASeq_automated.Level_2.1.0.0/TGCT_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf | | THCA | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_THCA.IlluminaGA_DNASeq.Level_2.1.5.0/AN_TCGA_THCA_PAIR_Capture_ALLQC_14Aug2013_429.aggregated.capture.tcga.uuid.somatic.maf | | THYM | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thym/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_THYM.IlluminaGA_DNASeq_automated.Level_2.1.0.0/hgsc.bcm.edu_THYM.IlluminaGA_DNASeq.1.somatic.maf | | UCEC | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/ucec/gsc/broad.mit.edu/illuminaga_dnaseq/mutations/broad.mit.edu_UCEC.IlluminaGA_DNASeq.Level_2.100.1.0/step4_An_UCEC_194.aggregated.tcga.maf2.4.migrated.somatic.maf | | UVM | https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/uvm/gsc/hgsc.bcm.edu/illuminaga_dnaseq_automated/mutations/hgsc.bcm.edu_UVM.IlluminaGA_DNASeq_automated.Level_2.1.0.0/hgsc.bcm.edu_UVM.IlluminaGA_DNASeq.1.somatic.maf |
An Loehr
June 30, 2016
Thank you for the explanation, Erik, but it still leaves open questions. For example, the ovarian data set published in 2011 that the somatic BRCA1 mutation rate is 9% and the somatic BRCA2 mutation rate is 8%. Again - CGC displays 2 BRCA2 mutations in a data set of >300 samples. There should at the very least be a disclaimer that the mutation data are incomplete, because what is being displayed is misleading at the best.
Nilesh Tawari
July 29, 2016
The suggested MAFs are not available now? What is alternate location?
  
Markdown is allowed