Data upload

How can I upload my private data to the CGC?

What is the paper.iCluster.Group parameter?

LUAD clinical data has a parameter paper_iCluster.Group. What does it stand for, I can't find the definition anywhere

How do I disable email notifications after job completion?

How do I disable email notifications after job completion?

TCGA BAM file size inconsistency with GDC?

Hi, I'm doing somatic mutation calling of a TCGA patient TCGA-AR-A1AO and am using the BAM file in databrower. I'm using the BAM files TCGA-AR-A1AO-10A-01D-A12Q-09_IlluminaGA-DNASeq_exome_gdc_realn.bam uuid:30f1d9e3-e6a5-44b6-846c-1497806d301c size: 27.03GB TCGA-AR-A1AO-01A-01D-A12Q-09_IlluminaGA-DNASeq_exome_gdc_realn.bam uuid: 33eeb804-ca8b-491e-8221-a285743be692 size: 25.53GB However, on GDC portal, the files are 29.02GB and 27.41GB respectively. I wonder if those files are really up to date as the file sizes are different and my somatic mutation calling result using Varscan2 is missing variants comparing to GDC results(Under same parameters and inputs.) It is just confusing so I am troubleshooting right now. Woule you please help me on this? Thanks! Best, Stella

Modify Read-only file in terminal

Hi! I am analyzing the TCGA MAF files using terminal in Data cruncher. Some MAF files are .gz format and I have to unzip them. But it's a read-only file system. Would you please help find a solution? Thank you! Best, Yiyun

Varscan2 work flow from BAM producing too few somatic mutation calls?

Hi! I am recently using the Varscan2 workflow from BAM to do somatic mutation calling of TCGA GRch38 BAM files. However, the output high confidence vcf files is only a few kb large. One of the patient I was looking, TCGA-AR-A1AO has around 6000 mutations called in the MuTect vcf but only have 300 mutations in my output. I didn't change any parameters. I wonder if it's the problem of input files but I was just using the tumor-normal bam in TCGA.

Centrifuge custom index

I've used centrifuge locally to generate an index based on the GenBank database. The output of this operation is 4 files with the *.cf extension. once I try to use them to run a centrifuge run on the cloud it asks for the index in tar.gz format. what does it mean? Then I'm trying to run the indexing on the cloud but I've found only a pipe to use RefSeq but not GenBank and in general, I would like to do it locally to have more freedom. now I'm running this script on the cloud: "Reference Index Creation " to create an index based on RefSeq and have the output in tar format but it is not exactly what I want to do. there is a way to have the index in tar format with centrifuge locally or to use genbank database in this cloud app to generate the index?

How to download many files from a project to a volume?

Greetings. I was wondering if you could help me. I have a number of files in a project that I am on that I would like to copy to an Amazon S3 bucket. I have mounted the bucket on a volume. However, it is not clear how to copy the files to the bucket. While https://docs.cancergenomicscloud.org/docs/aws-cloud-storage-tutorial#move-file-from-project shows how to move a particular file, I would like to move many files. Is this something that can be done via the Cancer Genomics Cloud GUI/web interface, or is there is some link showing me how it can be done? Many thanks!

STAR genome generate (2.7.0e) error

Hi, I'm using STAR genome generate and STAR from public apps (both 2.7.0e) to align a human RNA-seq data (uploaded privately) and I'm using GRCh38.primary_assembly.genome.fa and gencode.v32.annotation.gtf as the reference genome and gene annotation file for genome indices generation. I keep getting this error: Command mkdir genomeDir && STAR --runMode genomeGenerate --genomeDir ./genomeDir --runThreadN 20 --genomeChrBinNbits 16 --limitGenomeGenerateRAM 60000000000 --genomeFastaFiles /sbgenomics/workspaces/2bc67190-cbb2-43ba-866b-ca9e77ce024a/tasks/a72a496a-3859-480c-8de1-31c0e332b50e/star_genome_generate_2_7_0e/GRCh38.primary_assembly.genome.fa --sjdbGTFfile /sbgenomics/workspaces/2bc67190-cbb2-43ba-866b-ca9e77ce024a/tasks/a72a496a-3859-480c-8de1-31c0e332b50e/star_genome_generate_2_7_0e/gencode.v32.annotation.gtf && tar -vcf GRCh38.primary_assembly.genome.gencode.v32.annotation.star-2.7.0e-index-archive.tar ./genomeDir && mv Log.out Log.out.log failed with exit code 137. Can someone tell me how to solve this? Thank you!

Tissue Slides and Gene Expression Data

Hi CGC Team, I am planning to use the tissue slides and the gene expression data from TCGA. Therefore I need the connections between the two units. Since the TCGA barcodes sometimes don't match (e.g. if the portion number is different), I want to use the metadata to get this information. I created the query for it and tried to download the connections. But if I export the corresponding file, I can only download up to 3000 lines. That would be enough, but if I drop all duplicates, I only have 100 lines. Is there a way to download the hole table directly from the query? Best regards, Lena