HER2 status not consistent
I got the HER2, ER, PR status of BRCA patients from the BRCA TCGA publication supplementary: https://www.nature.com/articles/nature11412#supplementary-information filename: TCGA_Supplementary Tables 1-4.csv column: HER2_Final_Status But since it is from 2012, after some months, I downloaded the newer clinical data from cbioportal**: http://www.cbioportal.org/study?id=brca_tcga#clinical filename: data_bcr_clinical_data_patient.txt columns: IHC-Status, HER2 fish status However, I noticed that hundreds patients had different her2 status then they had in the old tcga publication: Barcode IHC-status(cbioportal) Her2 fish status (cbioportal) HER2_Final_Status(tcga publication) TCGA-A1-A0SH Equivocal Negative Negative TCGA-A2-A04U Negative Positive Negative TCGA-A2-A0T2 Negative Not Evaluated Negative TCGA-A8-A06R Positive Positive Equivocal At first, I thought there is a method to convert both ihc and fish statuses into one (final status). But I failed to find such method. In TCGA-A1-A0SH, it seems Fish is prefered. In TCGA-A2-A04U , it seems it is IHC. in TCGA-A8-A06R it is neither. **I verified that for those 4 patients, her2 status is consistent between cbioportal and current TCGA clinical files. (her2_fish_status, her2_status_by_ihc in clinical files) Thanks in advance, Maor
Posted by Maor Maor almost 4 years ago
why there are no nucleotides in the position of the reference genome after the SAMtools Mpileup
H! I have been creating workflow which consists three main steps ( SAMtools View, SAMtools fadix and SAMtools Mpileup). Firstly, with the SAMtools View I filtered the input bam-file based on a bed-file that contains special regions of the third and fifteenth chromosomes. The input bam-file already aligned and sorted I took from the database. The bed-file was download from my computer and the firs line looks like this (3 193593144 193697811). After that the SAMtools Mpileup took file that contain only necessary chromosomes and the file with the indexed reference (as a reference, I used ucsc.hg19.fasta from the database). At the end of the workflow, I expect to see vcf-file with that contains information about the reference and alternative nucleotide of the third and fifteenth chromosomes. Unfortunately, I get the described file, but there are N in place of the reference allele. Please help me understand what is wrong with my reference file
Posted by Ekaterina almost 4 years ago
Costs associated with using data on an AWS volume
I've set up a Volume to access files from a bucket I have under my own AWS account and copied a file into a project. Does this copy incur storage charges of its own?
Posted by Ian Fore about 4 years ago
Unable to extract results/outputs of the tool
Hi, I've built a simple tool using R, pushed it to the CGC repository and ran it. However, I have an issue with retrieving the outputs. It seems as the system cannot find them. For now, the tool produces two files: one called model.pdf and another model.txt. I set the glob values (outputs tab in the tool editor) to model.pdf and model.txt since their names are static. My initial thought of the issue was a wrong working directory. I couldn't find more details in the documentation. - How do I know, what is the working directory of the tool for the current analysis? Can I extract this information from the job or self variable? - Should provide information to mount specific directory? - What else could be wrong? For instance, I can successfully retrieve the stdout.txt (caught standard output) file. Also, the job.tree.log file shows stdout.txt available but no model.pdf nor model.txt. Thank you for the help.
Posted by Aleš Papič about 4 years ago
Clarification for TCGA data
I have trouble matching WSI slides to their grade, or TNM. For example, patient: TCGA-BC-A110 has three slide samples: Sample TCGA-BC-A110-01Z (Primary Tumor) Sample TCGA-BC-A110-01A (Primary Tumor) Sample TCGA-BC-A110-11A (Normal tissue) Question 1: Is it correct samples ending with A were all sampled together? Question 2: Can I know which were sampled first? Samples ending with A, or Z? [Pathology report exist only for A, with conclusion of Grade I. Clinical file nationwidechildrens.org_clinical.TCGA-BC-A110.xml states patient had cancer with grade I, and later a reoccurence. It means A is the first tumor event, and Z the second?] Question 3: I noticed pathology reports are never available for Z samples, and only for A. Is there a reason?
Posted by Maor Maor about 4 years ago
secondary files not loaded when in batch runs
Hi! I have a tool which analyzes .bam files and requires .bai files as secondary files. The secondary files settings are ok as the tool works well when a run on a single file is performed. However, when i try to make a batch task, it doesn't work and the error log says "unable to find index file for example.bam" or something like this. Is it a known bug? is there a way to overcome this problem? Thanks a lot!
Posted by filippo_martignano about 4 years ago
SomaticSniper failed with exit code 139
Hello, I am trying to run SomaticSniper and have uploaded a Normal BAM, Tumor BAM and Reference Sequence (.fasta). The run failed with exit code 139. Any ideas on how to fix this? Thank you!
Posted by Corey Miles over 4 years ago
SVS files are unavailable through the APi
SVS files are availble through the website: https://portal.gdc.cancer.gov/cases/9fe336a8-08a7-4fe7-bf45-afd6a8eb9c75 They are also available on the data portal as of May 21 2018: (same page, click on Files, it leads to data portal. You can see svs files for this case) However issue an API call to: https://cgc-datasets-api.sbgenomics.com/datasets/v0/tcga/cases/9FE336A8-08A7-4FE7-BF45-AFD6A8EB9C75/files No svs files are available.
Posted by Maor Maor over 4 years ago
Unable to see error messages from java application
Hello, I'm trying to bring a java application (MELTv2.1.4) to the cgc platform. I've made a docker which works perfectly locally and create a tool on CGC. However, maybe there is something wrong with my command line because if I run the tool the task fails with exit code 1. I would like to figure out what is wrong with the command line, but unfortunately job.err.log is totally blank so I cannot understand the problem. For example if I run the application without arguments in my docker ("java -jar MELT.jar Single"), the result is: ` Command Line: MELT.jar Single Start time: Jun 1, 2018 11:13:01 AM Performing MELT analysis... Missing required options: bamfile, h, w, t, n, c usage: java -jar MELT.jar Single <options> MELTv2.1.4 - MELT-Single - Perform transposon analysis on a single sample. -a Reads have been aligned with bwa-aln. [false] -ac Remove ac0 sites from final VCF file. [false] -b <arg> Exclusion list for chromosomes. A '/' seperated list: i.e. to exclude chromosomes 1,2, and 4, put -b 1/2/4. [null] -bamfile <arg> Bam file for MEI analysis. -bowtie <arg> Path to the bowtie2 algorithm if not in PATH [null]. -c <arg> Coverage level of supplied bam file. -cov <arg> Standard deviation cutoff when calling final sites in integer format.  -d <arg> Minumum length of chromosome/contig size for calling elements.  -e <arg> Expected insert size between reads.  -h <arg> Path to the reference sequence used to align reads. -j <arg> Total percentage of sites allowed to be no call (in integer form i.e. 25 percent would be -i 25, not .25).  -k BAM file(s) have already been processed for discordant pairs (suffixes .fq, .disc, and .disc.bai are already present for the bam file in -l). [false] -n <arg> Path to the genome annotation. -nocleanup Do not cleanup MELT intermediate files after running. [false] -q Alignments are pre Illumina 1.3 Quality encoding. [false] -r <arg> Read length of the supplied bam file(s).  -s <arg> Standard deviation cutoff for excluding sites with improper balance of readpairs in double format. [2.0] -sr <arg> Filter sites with less than X SRs during breakpoint ascertainment. Default, -1, is to not filter any such sites. [-1] -t <arg> Path to the transposon ZIP file(s) to be used for this analysis. -w <arg> Path to the working root directory. -z <arg> Maximum reads to load into memory when iterating over sequence files. Setting higher increases run time, but may increase sensitivity in large (>60X coverage) bam files. Setting lower may decrease sensitivity in all bam files.  -help will print this message and exit ` This is not an actual "error message" (like for example "Error: Unable to access jarfile MELTe.jar" that appears in the job.err.log file if I intentionally misspell the name of the app) so, maybe that's the reason why I can't see it in the job.err.log file. Is there a way to get it printed somewhere in order to understand eventual errors in the commandl line? I tried with a trick: ` java -jar MELT.jar Single > debuggy.txt ` This works fine "locally" in the docker; I tried also on CGC setting up a specific output port for the debug file (glob= *.txt); unfortunately there is no downloadable output file (which is very strange). Even more strange is the fact that, according to job.tree.log, the file has been created ` . . [ 89 Jun 01 10:48.16 UTC] cmd.log . [2.3K Jun 01 10:48.17 UTC] deboggy.txt . [ 0 Jun 01 10:48.27 UTC] job.err.log . [3.1K Jun 01 10:48.16 UTC] job.json . [ 0 Jun 01 10:48.27 UTC] job.tree.log 9.5K used in 0 directories, 5 files ` but still is not possible to download it. Can someone help me with this? Thanks a lot! Filippo
Posted by Filippo Martignano over 4 years ago
Unauthorized error when accessing files through API
I am attempting to access project output files through the CGC API. I am using the sevenbridges python package. I have run the code provided on the documentation for accessing projects with initialization via environment variables (https://pypi.python.org/pypi/sevenbridges-python). However, an Unauthorized error is returned. I have double checked and the authentication token I am using is correct. What would be the reason for this?
Posted by Susanna Chen over 4 years ago