How to download many files from a project to a volume?

Greetings. I was wondering if you could help me. I have a number of files in a project that I am on that I would like to copy to an Amazon S3 bucket. I have mounted the bucket on a volume. However, it is not clear how to copy the files to the bucket. While https://docs.cancergenomicscloud.org/docs/aws-cloud-storage-tutorial#move-file-from-project shows how to move a particular file, I would like to move many files. Is this something that can be done via the Cancer Genomics Cloud GUI/web interface, or is there is some link showing me how it can be done? Many thanks!

STAR genome generate (2.7.0e) error

Hi, I'm using STAR genome generate and STAR from public apps (both 2.7.0e) to align a human RNA-seq data (uploaded privately) and I'm using GRCh38.primary_assembly.genome.fa and gencode.v32.annotation.gtf as the reference genome and gene annotation file for genome indices generation. I keep getting this error: Command mkdir genomeDir && STAR --runMode genomeGenerate --genomeDir ./genomeDir --runThreadN 20 --genomeChrBinNbits 16 --limitGenomeGenerateRAM 60000000000 --genomeFastaFiles /sbgenomics/workspaces/2bc67190-cbb2-43ba-866b-ca9e77ce024a/tasks/a72a496a-3859-480c-8de1-31c0e332b50e/star_genome_generate_2_7_0e/GRCh38.primary_assembly.genome.fa --sjdbGTFfile /sbgenomics/workspaces/2bc67190-cbb2-43ba-866b-ca9e77ce024a/tasks/a72a496a-3859-480c-8de1-31c0e332b50e/star_genome_generate_2_7_0e/gencode.v32.annotation.gtf && tar -vcf GRCh38.primary_assembly.genome.gencode.v32.annotation.star-2.7.0e-index-archive.tar ./genomeDir && mv Log.out Log.out.log failed with exit code 137. Can someone tell me how to solve this? Thank you!

Tissue Slides and Gene Expression Data

Hi CGC Team, I am planning to use the tissue slides and the gene expression data from TCGA. Therefore I need the connections between the two units. Since the TCGA barcodes sometimes don't match (e.g. if the portion number is different), I want to use the metadata to get this information. I created the query for it and tried to download the connections. But if I export the corresponding file, I can only download up to 3000 lines. That would be enough, but if I drop all duplicates, I only have 100 lines. Is there a way to download the hole table directly from the query? Best regards, Lena

SGDP reference fasta file for BAM files not provided

Hello, I tried to do my own analysis on BAM files from SGDP project. Due to incompatible fasta file used to generate BAM file all my jobs encounter error. Therefore, I would like to ask you if you could please let me know how can I find the true reference fasta file this project. My error is this: A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found. reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT] features contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

Pull CGC repository images

Hi, I want to know how to pull image from CGC repository. Taken STAR as an example, I had tried to use "docker run -ti cgc-images.sbgenomics.com/admin/sbg-public-data/rna-seq-alignment-star-2-5-4b", but it didn't work. Can anyone tell me how to do, thanks in advance!

TCGA COAD Expression data

I have download TCGA COAD expression data file which have data like shown below. How these values are calculated and what does means by these values. Hybridization REF TCGA-AA-A00E-01A-01R-A002-07 Composite Element REF log2 lowess normalized (cy5/cy3) collapsed by gene symbol ELMO2 -0.201 CREB3L1 2.3005 RPS11 -0.080375 PNMA1 -1.23175

HER2 status not consistent

I got the HER2, ER, PR status of BRCA patients from the BRCA TCGA publication supplementary: https://www.nature.com/articles/nature11412#supplementary-information filename: TCGA_Supplementary Tables 1-4.csv column: HER2_Final_Status But since it is from 2012, after some months, I downloaded the newer clinical data from cbioportal**: http://www.cbioportal.org/study?id=brca_tcga#clinical filename: data_bcr_clinical_data_patient.txt columns: IHC-Status, HER2 fish status However, I noticed that hundreds patients had different her2 status then they had in the old tcga publication: Barcode IHC-status(cbioportal) Her2 fish status (cbioportal) HER2_Final_Status(tcga publication) TCGA-A1-A0SH Equivocal Negative Negative TCGA-A2-A04U Negative Positive Negative TCGA-A2-A0T2 Negative Not Evaluated Negative TCGA-A8-A06R Positive Positive Equivocal At first, I thought there is a method to convert both ihc and fish statuses into one (final status). But I failed to find such method. In TCGA-A1-A0SH, it seems Fish is prefered. In TCGA-A2-A04U , it seems it is IHC. in TCGA-A8-A06R it is neither. **I verified that for those 4 patients, her2 status is consistent between cbioportal and current TCGA clinical files. (her2_fish_status, her2_status_by_ihc in clinical files) Thanks in advance, Maor

why there are no nucleotides in the position of the reference genome after the SAMtools Mpileup

H! I have been creating workflow which consists three main steps ( SAMtools View, SAMtools fadix and SAMtools Mpileup). Firstly, with the SAMtools View I filtered the input bam-file based on a bed-file that contains special regions of the third and fifteenth chromosomes. The input bam-file already aligned and sorted I took from the database. The bed-file was download from my computer and the firs line looks like this (3 193593144 193697811). After that the SAMtools Mpileup took file that contain only necessary chromosomes and the file with the indexed reference (as a reference, I used ucsc.hg19.fasta from the database). At the end of the workflow, I expect to see vcf-file with that contains information about the reference and alternative nucleotide of the third and fifteenth chromosomes. Unfortunately, I get the described file, but there are N in place of the reference allele. Please help me understand what is wrong with my reference file

Costs associated with using data on an AWS volume

I've set up a Volume to access files from a bucket I have under my own AWS account and copied a file into a project. Does this copy incur storage charges of its own?

Unable to extract results/outputs of the tool

Hi, I've built a simple tool using R, pushed it to the CGC repository and ran it. However, I have an issue with retrieving the outputs. It seems as the system cannot find them. For now, the tool produces two files: one called model.pdf and another model.txt. I set the glob values (outputs tab in the tool editor) to model.pdf and model.txt since their names are static. My initial thought of the issue was a wrong working directory. I couldn't find more details in the documentation. - How do I know, what is the working directory of the tool for the current analysis? Can I extract this information from the job or self variable? - Should provide information to mount specific directory? - What else could be wrong? For instance, I can successfully retrieve the stdout.txt (caught standard output) file. Also, the job.tree.log file shows stdout.txt available but no model.pdf nor model.txt. Thank you for the help.