SGDP reference fasta file for BAM files not provided
Posted in General by Babak Arefnezhad Tue Jun 25 2019 11:05:28 GMT+0000 (Coordinated Universal Time)·3·Viewed 544 times
Hello,
I tried to do my own analysis on BAM files from SGDP project. Due to incompatible fasta file used to generate BAM file all my jobs encounter error. Therefore, I would like to ask you if you could please let me know how can I find the true reference fasta file this project.
My error is this:
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT]
features contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
Hello,
Usually this error occurs when there reference files and the bam are not in sync. It looks like you have a feature file that uses GRCh37 and a reference that uses hg19. Make sure all reference files come from the same bundle.
Please see
https://www.biostars.org/p/123767/
https://gatkforums.broadinstitute.org/gatk/discussion/11359/input-files-reference-and-features-have-incompatible-contigs
https://www.biostars.org/p/123767/
Feel free to contact support but clicking the question mark at the bottom right of the task screen. The CGC support team will contact you directly.
Best,
Dave
Thank you Dave for your prompt response.
The problem is that the reference fasta file which BAM files of SGDP generated, has not provided in Public Reference file. I found reference fasta named "Homo_sapiens_assembly19_1000genomes_decoy.fasta" is useful for handling just vcf files of SGDP project but not BAM files. I confuse why there are two types of reference fasta files for generating BAM and VCF files of SGDP dataset.
Regards,
Babak
Hi Babak,
The Seven Bridges support staff determined that "hs37d5" was used as a reference. According to this site
https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/reference_genomes.html
the reference in question is "GRCh37, [with] the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences in one file".
The "hs37d5" reference should correspond to this FASTA file on the CGC: https://cgc.sbgenomics.com/public/files/5772b6d8507c1752674486e6/
Switching to this FASTA file should solve your issue.
Regards,
Dave