Problems using bash script in a CGC tool

Posted in General by filippo_martignano Sun Oct 15 2017 18:45:35 GMT+0000 (UTC)·6·Viewed 78 times

Hello! I have created a tool based on Samtools, I used the following repository: images.sbgenomics.com/marouf/samtools:1.3 which i found in the public Samtools app. My goal is to make a tool that extracts the sample's name starting from .bam files, because i need them in the subsequent tool of my workflow. As input i'll give to samtools an array composed by two files (1 Tumor and 1 Normal tissue from the same patient), and i want the tool to discriminate if the name extracted belongs to a tumor sample, or a normal sample. So i wrote this bash script: for i in /sbgenomics/Projects/<myprojectpath>/*.bam; do if [ `/opt/samtools-1.3/samtools view -H $i | grep '^@RG' | sed "s/.*SM:............-\(...\)-.*/\1/g" | uniq` == "01A" ]; then /opt/samtools-1.3/samtools view -H $i | grep '^@RG' | sed "s/.*SM:\([^\t]*\).*/\1/g" | uniq; fi; done > tumor_name.txt In other words, for every bam in my folder (1 tumor and 1 normal) it should extract the 3 numbers that identify the sample type, compare them to "01A" (which is specific for tumor samples), if they are correct then it prints the entire sample name and puts it into a file. it returns the subsequent error log: 2017-10-13T18:10:34.886498415Z sh: 1: [: 11A: unexpected operator 2017-10-13T18:10:34.891268970Z sh: 1: [: 01A: unexpected operator 11A and 01A should be the "3 number ID" extracted as the first argument of the "if loop" (01A for the tumor sample, 11A for the normal sample), so apparently it seems that the if statement doesn't like them as arguments, as well as the opened square brackets. In the end, the tool returns me the file tumor_name.txt which is unfortunately empty (reasonably because the if statement didn't work). I thought that should have put #!/bin/bash before the "for loop" as my script is using bash commands. However when i use #!/bin/bash the standard output command ">" stops working, and this doesn't make sense to me. I tried a simple bash script to test it: #!/bin/bash echo 123 > file.txt and it doesn't work, while echo 123 > file.txt (without #!/bin/bash) works perfectly. Any help? Am i missing something in order to use bash scripts in CGC? Thank you very much in advance!
Erik Lehnert
Oct 16, 2017

Hi, this will be easier to troubleshoot if I can look directly at the bash script and how you're providing it. Would you be willing to invite me (username: elehnert) to your project?

However, since you're using TCGA data, it might be easier to just use a JavaScript expression to access the file's metadata to determine which one is a tumor sample and which one is a normal sample. You can do this by with something like the following code.

if ($job.inputs.EXAMPLE_INPUT.metadata['sample_type'] == "primary tumor") {
return 'echo ' + $job.inputs.EXAMPLE_INPUT.path + " > tumor_name.txt"
}

This will check the sample type and return a command to write the file's name to the file you specify. If you include this in the base command, I think that this should do what you want.

Erik Lehnert
Oct 16, 2017

Hi, this will be easier to troubleshoot if I can look directly at the bash script and how you're providing it. Would you be willing to invite me (username: elehnert) to your project?

However, since you're using TCGA data, it might be easier to just use a JavaScript expression to access the file's metadata to determine which one is a tumor sample and which one is a normal sample. You can do this by with something like the following code.

if ($job.inputs.EXAMPLE_INPUT.metadata['sample_type'] == "primary tumor") {
return 'echo ' + $job.inputs.EXAMPLE_INPUT.path + " > tumor_name.txt"
}

This will check the sample type and return a command to write the file's name to the file you specify. If you include this in the base command, I think that this should do what you want.

filippo_martignano
Oct 18, 2017

Hi!
Thank you very much for the response.
Maybe my explanation wasn’t clear, I needed a tool that extracts the full bam’s name and puts it into a file .txt, as Mutect2 needs the bam’s name as argument in the command line (e.g. --tumorSampleName <tumor_name> --normalSampleName <normal_name>).
Unfortunately i can't follow your suggestion as Mutect2 doesn't want to know only which file is a tumor and which is a normal, but wants the actual full name of the sample (that is provided in the header of the bam file) as an argument of a command.
Anyway, nevermind because in the end I was able to do that, now my workflow looks pretty much like this:
(Input: bam files)—>(tool that extracts the name)—>(Mutect2).
Now, the problem is that Mutect2 doesn’t accept a .txt file as an argument of the command “--tumorSampleName”.
My command line looks like this: “--tumorSampleName /path/tumor_name.txt”, and in order to put the actual tumor name instead of the entire path of the .txt file I wanted to transform it in “ --tumorSampleName cat /path/tumor_name.txt ” which locally works fine.
So I modified my input port putting in the “Value” field (in the “include in command line” section) this simple javascript command { return "cat " + $self + "" }.
However the resulting command line is: --tumorSampleName cat [object Object] , and of course it doesn’t work if I run my workflow.

Any suggestions?

Thank you very much again.

filippo_martignano
Oct 18, 2017

sorry i didn't realized that the backquotes are seen as a metacharacter in this forum.
i try to edit what i wrote in order to make it understandable:

My command line looks like this: “--tumorSampleName /path/tumor_name.txt”, and in order to put the actual tumor name instead of the entire path of the .txt file I wanted to transform it in “ --tumorSampleName 'cat /path/tumor_name.txt' ” which locally works fine.
So I modified my input port putting in the “Value” field (in the “include in command line” section) this simple javascript command { return " 'cat " + $self + " ' " }.
However the resulting command line is: --tumorSampleName 'cat [object Object]' , and of course it doesn’t work if I run my workflow.

I replaced backquotes with single quotes just to let you understand...imagine that instead of single quotes there are backquotes in the script.

Thanks.

filippo_martignano
Oct 18, 2017

Anyway i'm setting up the project, in the next minutes i'll add you!

Durga
Oct 19, 2017

SBG has a tool "SBG Pair FASTQs by Metadata", which I used for a similar task using fastq files. If they have a similar tool for bam files, I guess that will be your solution.
Hope this helps, Durga

  
Markdown is allowed