TCGA BAM file size inconsistency with GDC?

Yiyun Rao Fri Feb 07 2020 21:47:12 GMT+0000 (Coordinated Universal Time)

Hi, I'm doing somatic mutation calling of a TCGA patient TCGA-AR-A1AO and am using the BAM file in databrower. I'm using the BAM files TCGA-AR-A1AO-10A-01D-A12Q-09_IlluminaGA-DNASeq_exome_gdc_realn.bam uuid:30f1d9e3-e6a5-44b6-846c-1497806d301c size: 27.03GB TCGA-AR-A1AO-01A-01D-A12Q-09_IlluminaGA-DNASeq_exome_gdc_realn.bam uuid: 33eeb804-ca8b-491e-8221-a285743be692 size: 25.53GB However, on GDC portal, the files are 29.02GB and 27.41GB respectively. I wonder if those files are really up to date as the file sizes are different and my somatic mutation calling result using Varscan2 is missing variants comparing to GDC results(Under same parameters and inputs.) It is just confusing so I am troubleshooting right now. Woule you please help me on this? Thanks! Best, Stella
Feb 24, 2020

Hi Yiyun,

Here's the response from our engineering team:

File sizes on the GDC Data Portal are displayed in gigabytes (GB), while the file size unit displayed on the CGC is gibibyte (GiB). For example, the size of TCGA-AR-A1AO-10A-01D-A12Q-09_IlluminaGA-DNASeq_exome_gdc_realn.bam indeed is 29.02 GB, which is equal to 27.03 GiB, as shown on the CGC. Therefore, there is no difference in the file size itself, but just in the displayed unit.

Feel free to ask if you have more questions.


