Forum
Clarification for TCGA data
I have trouble matching WSI slides to their grade, or TNM.
For example, patient: TCGA-BC-A110
has three slide samples:
Sample TCGA-BC-A110-01Z (Primary Tumor)
Sample TCGA-BC-A110-01A (Primary Tumor)
Sample TCGA-BC-A110-11A (Normal tissue)
Question 1:
Is it correct samples ending with A were all sampled together?
Question 2:
Can I know which were sampled first? Samples ending with A, or Z?
[Pathology report exist only for A, with conclusion of Grade I. Clinical file nationwidechildrens.org_clinical.TCGA-BC-A110.xml states patient had cancer with grade I, and later a reoccurence. It means A is the first tumor event, and Z the second?]
Question 3:
I noticed pathology reports are never available for Z samples, and only for A. Is there a reason?
Posted by Maor Maor about 5 years ago
secondary files not loaded when in batch runs
Hi!
I have a tool which analyzes .bam files and requires .bai files as secondary files.
The secondary files settings are ok as the tool works well when a run on a single file is performed.
However, when i try to make a batch task, it doesn't work and the error log says "unable to find index file for example.bam" or something like this.
Is it a known bug? is there a way to overcome this problem?
Thanks a lot!
Posted by filippo_martignano about 5 years ago
SomaticSniper failed with exit code 139
Hello, I am trying to run SomaticSniper and have uploaded a Normal BAM, Tumor BAM and Reference Sequence (.fasta). The run failed with exit code 139. Any ideas on how to fix this? Thank you!
Posted by Corey Miles over 5 years ago
SVS files are unavailable through the APi
SVS files are availble through the website:
https://portal.gdc.cancer.gov/cases/9fe336a8-08a7-4fe7-bf45-afd6a8eb9c75
They are also available on the data portal as of May 21 2018:
(same page, click on Files, it leads to data portal. You can see svs files for this case)
However issue an API call to:
https://cgc-datasets-api.sbgenomics.com/datasets/v0/tcga/cases/9FE336A8-08A7-4FE7-BF45-AFD6A8EB9C75/files
No svs files are available.
Posted by Maor Maor over 5 years ago
Unable to see error messages from java application
Hello, I'm trying to bring a java application (MELTv2.1.4) to the cgc platform.
I've made a docker which works perfectly locally and create a tool on CGC.
However, maybe there is something wrong with my command line because if I run the tool the task fails with exit code 1.
I would like to figure out what is wrong with the command line, but unfortunately job.err.log is totally blank so I cannot understand the problem.
For example if I run the application without arguments in my docker ("java -jar MELT.jar Single"), the result is:
`
Command Line:
MELT.jar Single
Start time: Jun 1, 2018 11:13:01 AM
Performing MELT analysis...
Missing required options: bamfile, h, w, t, n, c
usage: java -jar MELT.jar Single <options>
MELTv2.1.4 - MELT-Single - Perform transposon analysis on a single sample.
-a Reads have been aligned with bwa-aln. [false]
-ac Remove ac0 sites from final VCF file. [false]
-b <arg> Exclusion list for chromosomes. A '/' seperated list: i.e. to exclude chromosomes 1,2, and 4, put -b 1/2/4. [null]
-bamfile <arg> Bam file for MEI analysis.
-bowtie <arg> Path to the bowtie2 algorithm if not in PATH [null].
-c <arg> Coverage level of supplied bam file.
-cov <arg> Standard deviation cutoff when calling final sites in integer format. [35]
-d <arg> Minumum length of chromosome/contig size for calling elements. [1000000]
-e <arg> Expected insert size between reads. [500]
-h <arg> Path to the reference sequence used to align reads.
-j <arg> Total percentage of sites allowed to be no call (in integer form i.e. 25 percent would be -i 25, not .25). [25]
-k BAM file(s) have already been processed for discordant pairs (suffixes .fq, .disc, and .disc.bai are already present for the bam file in -l). [false]
-n <arg> Path to the genome annotation.
-nocleanup Do not cleanup MELT intermediate files after running. [false]
-q Alignments are pre Illumina 1.3 Quality encoding. [false]
-r <arg> Read length of the supplied bam file(s). [100]
-s <arg> Standard deviation cutoff for excluding sites with improper balance of readpairs in double format. [2.0]
-sr <arg> Filter sites with less than X SRs during breakpoint ascertainment. Default, -1, is to not filter any such sites. [-1]
-t <arg> Path to the transposon ZIP file(s) to be used for this analysis.
-w <arg> Path to the working root directory.
-z <arg> Maximum reads to load into memory when iterating over sequence files. Setting higher increases run time, but may increase sensitivity in large (>60X coverage) bam files. Setting lower may decrease sensitivity in all bam files. [5000]
-help will print this message and exit
`
This is not an actual "error message" (like for example "Error: Unable to access jarfile MELTe.jar" that appears in the job.err.log file if I intentionally misspell the name of the app) so, maybe that's the reason why I can't see it in the job.err.log file.
Is there a way to get it printed somewhere in order to understand eventual errors in the commandl line?
I tried with a trick:
`
java -jar MELT.jar Single > debuggy.txt
`
This works fine "locally" in the docker; I tried also on CGC setting up a specific output port for the debug file (glob= *.txt); unfortunately there is no downloadable output file (which is very strange).
Even more strange is the fact that, according to job.tree.log, the file has been created
`
.
. [ 89 Jun 01 10:48.16 UTC] cmd.log
. [2.3K Jun 01 10:48.17 UTC] deboggy.txt
. [ 0 Jun 01 10:48.27 UTC] job.err.log
. [3.1K Jun 01 10:48.16 UTC] job.json
. [ 0 Jun 01 10:48.27 UTC] job.tree.log
9.5K used in 0 directories, 5 files
`
but still is not possible to download it.
Can someone help me with this?
Thanks a lot!
Filippo
Posted by Filippo Martignano over 5 years ago
Unauthorized error when accessing files through API
I am attempting to access project output files through the CGC API. I am using the sevenbridges python package. I have run the code provided on the documentation for accessing projects with initialization via environment variables (https://pypi.python.org/pypi/sevenbridges-python). However, an Unauthorized error is returned. I have double checked and the authentication token I am using is correct. What would be the reason for this?
Posted by Susanna Chen over 5 years ago
Case Explorer in GRch38
Hi!
I used to be able to find my gene of interest (C1ORF61) using the Case Explorer but now I can't find it any more. Is there any gene re-naming in this latest version? Thanks!
Posted by Xavier Bofill De Ros almost 6 years ago
Get diagnosis data
Hi!
I'm trying to download the diagnosis data to see if I can find some correlations with my gene of interest.
In the Data browser I can see the Diagnosis Details but despite making subgroups for the data import I can't get those details in the metadata file. Is there any way to extract them?
Thanks
Posted by Xavier Bofill De Ros almost 6 years ago
Problems using bash script in a CGC tool
Hello!
I have created a tool based on Samtools, I used the following repository: images.sbgenomics.com/marouf/samtools:1.3 which i found in the public Samtools app.
My goal is to make a tool that extracts the sample's name starting from .bam files, because i need them in the subsequent tool of my workflow.
As input i'll give to samtools an array composed by two files (1 Tumor and 1 Normal tissue from the same patient), and i want the tool to discriminate if the name extracted belongs to a tumor sample, or a normal sample.
So i wrote this bash script:
for i in /sbgenomics/Projects/<myprojectpath>/*.bam; do if [ `/opt/samtools-1.3/samtools view -H $i | grep '^@RG' | sed "s/.*SM:............-\(...\)-.*/\1/g" | uniq` == "01A" ]; then /opt/samtools-1.3/samtools view -H $i | grep '^@RG' | sed "s/.*SM:\([^\t]*\).*/\1/g" | uniq; fi; done > tumor_name.txt
In other words, for every bam in my folder (1 tumor and 1 normal) it should extract the 3 numbers that identify the sample type, compare them to "01A" (which is specific for tumor samples), if they are correct then it prints the entire sample name and puts it into a file.
it returns the subsequent error log:
2017-10-13T18:10:34.886498415Z sh: 1: [: 11A: unexpected operator
2017-10-13T18:10:34.891268970Z sh: 1: [: 01A: unexpected operator
11A and 01A should be the "3 number ID" extracted as the first argument of the "if loop" (01A for the tumor sample, 11A for the normal sample), so apparently it seems that the if statement doesn't like them as arguments, as well as the opened square brackets.
In the end, the tool returns me the file tumor_name.txt which is unfortunately empty (reasonably because the if statement didn't work).
I thought that should have put #!/bin/bash before the "for loop" as my script is using bash commands.
However when i use #!/bin/bash the standard output command ">" stops working, and this doesn't make sense to me.
I tried a simple bash script to test it: #!/bin/bash echo 123 > file.txt
and it doesn't work, while echo 123 > file.txt (without #!/bin/bash) works perfectly.
Any help?
Am i missing something in order to use bash scripts in CGC?
Thank you very much in advance!
Posted by filippo_martignano almost 6 years ago
Local file upload fails with cgc_uploader
Initializing upload...
Starting upload of 1 file(s) to MY-PROJECT
The upload stays at 0.00%, then fails.
I'm using the auth token from the Developer Dashboard.
The command-line:
~/cgc-uploader/bin/cgc-uploader.sh -t MY-TOKEN -p MY-USERNAME/MY-PROJECT MY-FILE
(Side note: I'm unable to use the GUI uploader, because my IT department blocks it as an unknown app.)
Posted by Jonathan Bingham over 6 years ago