SGDP reference fasta file for BAM files not provided

Posted in General by Babak Arefnezhad Tue Jun 25 2019 11:05:28 GMT+0000 (Coordinated Universal Time)·3·Viewed 164 times

Hello, I tried to do my own analysis on BAM files from SGDP project. Due to incompatible fasta file used to generate BAM file all my jobs encounter error. Therefore, I would like to ask you if you could please let me know how can I find the true reference fasta file this project. My error is this: A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found. reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT] features contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
Dave Roberson
June 25, 2019

Hello,

Usually this error occurs when there reference files and the bam are not in sync. It looks like you have a feature file that uses GRCh37 and a reference that uses hg19. Make sure all reference files come from the same bundle.

Please see

https://www.biostars.org/p/123767/
https://gatkforums.broadinstitute.org/gatk/discussion/11359/input-files-reference-and-features-have-incompatible-contigs

https://www.biostars.org/p/123767/

Feel free to contact support but clicking the question mark at the bottom right of the task screen. The CGC support team will contact you directly.

Best,
Dave

Babak Arefnezhad
June 25, 2019

Thank you Dave for your prompt response.

The problem is that the reference fasta file which BAM files of SGDP generated, has not provided in Public Reference file. I found reference fasta named "Homo_sapiens_assembly19_1000genomes_decoy.fasta" is useful for handling just vcf files of SGDP project but not BAM files. I confuse why there are two types of reference fasta files for generating BAM and VCF files of SGDP dataset.

Regards,
Babak

Dave Roberson
June 27, 2019

Hi Babak,

The Seven Bridges support staff determined that "hs37d5" was used as a reference. According to this site

https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/reference_genomes.html

the reference in question is "GRCh37, [with] the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences in one file".

The "hs37d5" reference should correspond to this FASTA file on the CGC: https://cgc.sbgenomics.com/public/files/5772b6d8507c1752674486e6/

Switching to this FASTA file should solve your issue.

Regards,
Dave

  
Markdown is allowed