{"__v":202,"_id":"5637f0befbe1c50d008cb087","category":{"project":"55faf11ba62ba1170021a9a7","version":"55faf11ba62ba1170021a9aa","_id":"56fbb83d8f21c817002af880","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-03-30T11:27:57.862Z","from_sync":false,"order":1,"slug":"tutorials","title":"TUTORIALS"},"parentDoc":null,"project":"55faf11ba62ba1170021a9a7","user":"554340dfb7f4540d00fcef1d","version":{"__v":35,"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":["57e3f6614d659c0e0006e960"],"next":{"pages":[],"description":""},"createdAt":"2015-11-02T23:24:46.472Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":2,"body":"##Introduction\n\nThe CGC Software Development Kit (Rabix) allows you to add your tools to the CGC and use them to run analyses, as you do with tools that are already publicly available on the CGC. This is done by installing the tools in a Docker container and then describing their behavior on the CGC.\n\nThe first part of the procedure in deploying your tool to the CGC is to create a [Docker](doc:docker-basics) container and install the tool in it. Once this is done, you need create a snapshot of the container, called an image, and push it to the CGC image registry (cgc-images.sbgenomics.com) or the official Docker image registry, [Docker Hub](http://hub.docker.com).\n\nThe second part of the procedure is to specify the tool's behavior on the CGC, including its inputs and outputs, runtime requirements, and execution semantics. The specification is entered using the [Tool Editor](doc:the-tool-editor). This allows you to use the tool as an individual application on the CGC or interface it with other tools and create workflows.\n\n##Objective\nThis tutorial demonstrates how to install and describe the sort subcommand of SamTools on the CGC. Specifically, we shall: install the bioinformatics package SamTools into a Docker container, push it to the CGC image hub, and describe the sort subcommand in the tool editor. \n\n##Prerequisites\n\nFor this tutorial, you will need:\n1. A <a href=\"https://cgc.sbgenomics.com/login/\" target=\"blank\">CGC account</a>. \n2. One of the following machines:\n  * A Linux computer with Docker installed on it. [Full installation instructions are available here](upload-your-docker-image#section-installing-docker-on-linux).\n * A Mac with Docker Machine installed on it. [Full installation instructions for OS X are available here](upload-your-docker-image#section-installing-docker-on-os-x).\n * A Windows computer with Docker for Windows or Docker Toolbox depending on the Windows version [Full installation instructions for Windows are available here](upload-your-docker-image#section-installing-docker-on-windows).\n[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"On this page:\",\n  \"body\": \"[1. Create a project](#section-1-create-a-project)\\n[2. Upload SamTools in a Docker image](#section-2-upload-samtools-in-a-docker-image)\\n[3. Describe each subcommand tool in the graphical editor](#section-3-describe-each-subcommand-tool-in-the-graphical-editor)\"\n}\n[/block]\n##1. Create a project\n1. Log in to your CGC account, and click **Create a project** in the main navigation bar. \n2. Name the project (e.g. 'SamTools', you can always delete this project when you've finished the tutorial.)\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/00157d9-cgc-create-a-project.jpg\",\n        \"cgc-create-a-project.jpg\",\n        476,\n        384,\n        \"#ebeff1\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##2. Upload SamTools in a Docker image\n\nWe will first use [Docker image](doc:upload-your-docker-image) to create an image containing Samtools. We'll start with an Ubuntu base image, install SamTools, then commit and push the image to the [CGC image registry](the-cgc-image-registry), and push it to upload the tool to the CGC. This is illustrated in the example below. The username used in the example is `rfranklin`, the developer project name is `samtools` and the image tag is `v1`.\n[block:callout]\n{\n  \"type\": \"danger\",\n  \"body\": \"In the example, most of the steps taken are in order to get the tools needed to compile SamTools. These details will vary for different tools. As such, the example here should not be taken as a template for all command line tools.\"\n}\n[/block]\nTo create a Docker image:\n[block:callout]\n{\n  \"type\": \"success\",\n  \"title\": \"Uploading Docker images\",\n  \"body\": \"If you haven't already seen it, take a look at the [documentation on uploading Docker images](doc:upload-your-docker-image).\"\n}\n[/block]\n1. Open up a terminal to get started.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"* **Docker on Mac OS 10.10.3 Yosemite or newer** run Docker for Mac and start a terminal of your choice.\\n* **Docker on Mac OS 10.8 Mountain Lion or newer** run Docker Machine, by opening Docker Quickstart terminal or by using the command docker-machine start default.\\n* **Windows 7** or **8**: run Docker Quickstart Terminal.\\n* **Windows 10**: run Docker for Windows and start a terminal of your choice.\\n* **Linux**: skip this step.\",\n  \"title\": \"Depending on your operating system, first make sure that Docker is started:\"\n}\n[/block]\n2. To install SAMtools in a Docker container, we will enter the following commands:\n 2.1 Log in to the CGC image registry (cgc-images.sbgenomics.com) from the terminal:\n[block:callout]\n{\n  \"type\": \"danger\",\n  \"body\": \"Note that you should enter your **[authentication token](get-your-authentication-token)** in response to the password prompt, <span style=\\\"color:red\\\"><b>not your CGC password</b></span>.\",\n  \"title\": \"Docker login\"\n}\n[/block]\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"$ docker login cgc-images.sbgenomics.com # You should enter your authentication token in response to the password prompt, not your CGC password.\\nUsername: rfranklin\\nPassword:\\nEmail: rfranklin:::at:::sbgenomics.com\",\n      \"language\": \"shell\",\n      \"name\": \"Installing Samtools in an Ubuntu container\"\n    }\n  ]\n}\n[/block]\n 2.2 Load up a container from the Ubuntu base image, update the packages inside the container and install SAMTools:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"$ docker run -ti ubuntu # Load up a container with the ubuntu base image and run bash inside\\nCreating container from image ubuntu\\nroot@container$ apt-get update # Update the package index inside the container\\nroot@container$ apt-get install wget build-essential zlib1g-dev libncurses5-dev # Install the tools we need to compile SamTools\\nroot@container$ wget https://github.com/samtools/samtools/releases/download/1.2/samtools-1.2.tar.bz2 # Download the Samtools source code\\nroot@container$ tar jxf samtools-1.2.tar.bz2 # Unpack the archive\\nroot@container$ cd samtools-1.2 # Go into the directory containing the unpacked Samtools source code\\nroot@container$ make # Compile the code\\nroot@container$ make install # Install the resulting binaries\\nroot@container$ samtools --version # Check that SamTools has installed\\nroot@container$ exit \",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"You can choose any Docker base image for your tool.\"\n}\n[/block]\nA Docker container is a running instance of a Docker image. Once you have instantiated a container from an image using the docker run command, the initial part of the command line will change to something in the form `root@container`, e.g. `root@afa7af5b5d8b`. The `root` part denotes that you are the root user within the container, while the part after the '@' symbol is the ID of the container. Once you have exited the container, copy the container ID as you will need it to perform the next step.\n\n 2.3 Commit the container to the image:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"$ docker commit 19d574537671 cgc-images.sbgenomics.com/rfranklin/samtools:v1 # Grab the container ID '19d574537671' from the command prompt inside the container you just exited, to commit its image\\n7f7f2b36bffd5dae5d8e4c699079aa96379f5075ce175fb4abd0197a46ebfcd3\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\nThe repository name used in this example is `rfranklin/samtools`, following the `<user_name>/<project_name>` pattern. Please note that the allowed characters for repository names are lowercase and uppercase letters, numbers 0 to 9, dash (`-`) and underscore (`_`). Learn more about [repositories in the CGC image registry](the-cgc-image-registry#section-repositories-in-the-cgc-image-registry).\n\n 2.4 Push the image to the CGC image registry:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"$ docker push cgc-images.sbgenomics.com/rfranklin/samtools:v1\\nThe push refers to a repository [cgc-images.sbgenomics.com/hodesdon/new] (len: 1)\\n...\\nlatest: digest: sha256:d2304a53961b9e8215805448d0738a0174b3b18ee6ea6145bf1d0062d615ae1a size: 8039\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\n <div align=\"right\"><a href=\"#top\">top</a></div>\n\n##3. Describe each subcommand tool in the graphical editor\n\nWe have created a Docker image with SamTools inside, and pushed it to the CGC image repository. To use SamTools on the CGC, we still need to capture its interface with the tool editor, so that it can be integrated with other CGC tools.\n\nThe tool editor treats each subcommand of a command line tool as a distinct tool. So, in this example, we will just describe the SamTools subcommand `sort`.\n \n1. The tool editor can be accessed from inside the project that you created on the CGC. Go to the dashboard for the project, and click '**Create**' on the panel marked '**Apps**'. This brings up a drop-down box. Select Command line tool to describe a new tool.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/ykW9wpXRT7OWCDETX1T5_Screenshot%202015-11-11%2014.20.55.png\",\n        \"Screenshot 2015-11-11 14.20.55.png\",\n        \"1104\",\n        \"108\",\n        \"#4f71af\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/U8QRKCcKSGyomU5TcsYT_Screenshot%202015-11-11%2014.21.35.png\",\n        \"Screenshot 2015-11-11 14.21.35.png\",\n        \"844\",\n        \"608\",\n        \"#4a775e\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n2. To add the tool, first give it a name. Let's name the SamTools `sort` subcommand 'SamTools-sort': \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/9UDS2vzdSk68ZgmHLg0J_Screen%20Shot%202016-03-18%20at%2011.29.53.png\",\n        \"Screen Shot 2016-03-18 at 11.29.53.png\",\n        \"784\",\n        \"616\",\n        \"#39674d\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n3. Once you have given your tool a name, and clicked **Create**, [the graphical tool editor interface](doc:the-tool-editor) will open. This contains fields with which **you can characterize the format of the subcommand as it is executed on the command line**.\n\nSince we're going to characterize the `sort` subcommand, let's take another look at its usage. We can use `sort` by entering `samtools sort` on the command line, inside the Docker container where we installed SamTools.\n\nSince we exited the container in which we installed SamTools, we need to open it up again so that we can query the usage of SamTools sort. Do this using the Rabix CLI with the command `cgc docker-run <image>` where `< image>` is either the `image ID` or `<repository>/<tag>` .\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"$ cgc docker-run cgc-images.sbgenomics.com/rfranklin/samtools:v1 # Open the container with SamTools in it\\nCreating container from image cgc-images.sbgenomics.com/rfranklin/samtools:v1\\n$ samtools sort # Now let's check the usage of the sort subcommand\\nUsage: samtools sort [options...] [in.bam]\\nOptions:\\n  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)\\n  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]\\n  -n         Sort by read name\\n  -o FILE    Write final output to FILE rather than standard output\\n  -O FORMAT  Write output as FORMAT ('sam'/'bam'/'cram')   (either -O or\\n  -T PREFIX  Write temporary files to PREFIX.nnnn.bam      -T is required)\\n  -@ INT     Set number of sorting and compression threads [1]\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\nWe can see that the sort subcommand takes the options listed above, followed by an input file, `in.bam`. \n\nLet's suppose we want sort to output a file named 'output.bam'. In this case, given the usage of `sort` we need to use the following options:\n  * The default behavior of the subcommand is to write the sorted output file to standard output. However, this default behavior can be overridden to instead produce output as a file named 'output.bam', using the option `-o output.bam`.\n  * Since we want to output a BAM file, we need to specify the file format as well as the filename. We do this with the option `-O bam`.\n  * The `-T` option says it is required. This fixes the prefix of the temporary files. Let's prefix our temporary files with `tmp_ `. We do this with the option ` -T tmp_ `.\n\nTo achieve this behavior for an input file named 'unsorted.bam', we would need to execute the following command: `samtools sort -O bam -T tmp_ -o output.bam unsorted.bam`\n\nWe will describe our required behavior in the tool editor by entering the information specific below:\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"See the [documentation on the Tool Editor](doc:the-tool-editor) for more information on the fields below.\"\n}\n[/block]\nWithin an application, there are five tabs, **General**, **Inputs**, **Outputs**, **Additional Information**, and **Test**. We will walk through the steps required to describe a tool in each of these below.\n\n**General Tab**\n\n**Docker Repository[:Tag]:** This is the[ location of the image](the-cgc-image-registry#section-repositories-in-the-cgc-image-registry) containing the command. Its format is `images.sbgenomics.com/<repository><:tag>`. In our example we would enter:  `cgc-images.sbgenomics.com/rfranklin/samtools:v1`\n\n**CPU:** We'll leave this with the default of `1`.\n\n**Memory:** Set the value to **5000 MB** of RAM. This amount of memory is needed to process the BAM file that will be used as the input. \n\n**Base Command:** The base command is the part of the command that precedes any arguments; in other words, it is the command and subcommand, if there is one. In our example, samtools-sort, we enter `samtools sort` into this field. The editor splits base commands on spaces, so this entry will split into a field containing `samtools` and a field containing `sort`.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/P3BVYSSK3dvtEZ45IjA1_Screenshot%202015-11-11%2015.12.29.png\",\n        \"Screenshot 2015-11-11 15.12.29.png\",\n        \"1250\",\n        \"342\",\n        \"#d0d0d0\",\n        \"\"\n      ],\n      \"caption\": \"Under Command, enter 'samtools sort' and the GUI will break this into the base command 'samtools' and the subcommand 'sort'.\"\n    }\n  ]\n}\n[/block]\n**Stdin, Stdout:** We can leave these blank. We decided to pipe output to a file instead of standard output.\n**Success code and Temporary fail code**: Set these to `0` and `1` respectively. This is standard behavior.\n**Arguments: **We want SamTools sort to output a BAM file named 'output.bam'. As described above, we can make it do this using the following code: `samtools sort -O bam -T tmp_ -o output.bam unsorted.bam`\n\nWe can enter the arguments for this command in the Arguments field of the tool editor, as follows:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/iVxCEJSSDqFKi3eoUJul_Screenshot%202015-11-11%2015.10.30.png\",\n        \"Screenshot 2015-11-11 15.10.30.png\",\n        \"1246\",\n        \"272\",\n        \"#406b7d\",\n        \"\"\n      ],\n      \"caption\": \"Click on the '+' to being entering arguments for the command line.\"\n    }\n  ]\n}\n[/block]\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/w6QbFEABTgmgTavtudYM_Screenshot%202015-11-11%2015.10.40.png\",\n        \"Screenshot 2015-11-11 15.10.40.png\",\n        \"1788\",\n        \"584\",\n        \"#54708a\",\n        \"\"\n      ],\n      \"caption\": \"A dialog window will pop up with default fields.\"\n    }\n  ]\n}\n[/block]\n**Argument 1:**\n1. **Value:** `bam`\n2. **Prefix:** `-O`\n3. **Separate prefix with:** `space`\n4. **Position:** `1`\n[block:image]\n{\n  \"images\": [\n    {\n      \"caption\": \"Enter the appropriate values and Save.\",\n      \"image\": [\n        \"https://files.readme.io/i5vfZDCGQDqoNFjh5xvX_Screenshot%202015-11-11%2015.17.05.png\",\n        \"Screenshot 2015-11-11 15.17.05.png\",\n        \"1782\",\n        \"572\",\n        \"#536f88\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n**Argument 2:**\n1. **Value:** `tmp_`\n2. **Prefix:** `-T`\n3. **Separate prefix with:** `space`\n4. **Position:** `2`\n\n**Argument 3:**\n1. **Value:** `output.bam`\n2. **Prefix:** `-o`\n3. **Separate prefix with:** `space`\n4. **Position:** `3`\n\nThese resulting Arguments settings are shown in the following screenshot:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/3TTKmbxTCm74IkkYzwOd_Screen%20Shot%202016-03-18%20at%2011.52.00.png\",\n        \"Screen Shot 2016-03-18 at 11.52.00.png\",\n        \"1316\",\n        \"594\",\n        \"#4e5e6d\",\n        \"\"\n      ],\n      \"caption\": \"Once done, this is what you should see in the Arguments section.\"\n    }\n  ]\n}\n[/block]\nWhen you have finished, the General Information tab should look like this:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/7f7f3a8-samtools-tutorial-generla-tab.png\",\n        \"samtools-tutorial-generla-tab.png\",\n        1589,\n        826,\n        \"#e9ecee\"\n      ],\n      \"caption\": \"\"\n    }\n  ]\n}\n[/block]\n**The Inputs Tab**\n\n1. Click the **+** button to add an input port.\n2. Set the **ID** of the input to 'BAM' to label it as the port where BAM files are inputted to the tool. Set its **Type** to 'File'.\n3. Enter a **Label** to be displayed on graphical interfaces: we went with 'Bam files input'. Enter a description as well, if you like.\n4. There are no **Secondary files** so we can leave this blank.\n5. Check the box marked **Include in command line** to enable command line binding. This indicates that when the file is executed on the command line, file inputs are entered into the terminal on the command line.\n6. Under the checkbox to **Include in command line**, we enter the details of how inputs are entered to the command line. \na. Leave the **Value** field: we'll enter files directly on the command line. \nb. Leave the **Prefix** field empty as well. This indicates that files are entered to the terminal with no preceding option to indicate the input (although there may be preceding options to control other aspects of the command).\nc. Set the **Position** to 4, to indicate that the input file comes after the `-O bam` option, whose position we set to 1, the `-T tmp_` option, whose position we set to 2, and the `-o output.bam` option, whose position we set to 3.\nd. This setting indicates that for an input file named input.bam passed to the command line tool, the full command will have the form: `samtools sort -O bam -T tmp_ -o output.bam input.bam`\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/BFTKdd6QWKtK37VDYcKV_Screen%20Shot%202016-06-24%20at%201.34.23%20PM.png\",\n        \"Screen Shot 2016-06-24 at 1.34.23 PM.png\",\n        \"1342\",\n        \"1326\",\n        \"#d3554f\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n**The Outputs Tab**\n\n1. Click the **+** button to add an output port.\n2. Set the **ID** to name your output port. We'll name this one 'sorted'.\n3. Set the **Type** to 'File' for you sorted files.\n4. Set the **glob field** to '*.bam'. This use of globbging will pattern-match any file that ends with '.bam' and report is as the output of the tool.\n5. Enter a **label** for the port, which will be used on any visual interface the tool appears in, such as the workflow editor.\n6. Specify the **File Types** that the port produces, in this case BAM files.\n7. In this example, we haven't annotated the output files with metadata or included secondary files (like index files) so we can leave the rest of the fields blank.\n\n**The Test Tab**\n\n1. Fill in some dummy input values of the kind you would enter as command line arguments to the tool. Then you can inspect the resulting command line, at the bottom of the tab. \n2. The tool has a single input port for files, so we can enter a BAM file name, 'unsorted.bam' as a dummy input for this port. Notice that the command line output at the bottom of the screen changes to show the command that would be executed on the command line if a BAM file named 'unsorted.bam' were stipulated as the input file for the SamTools sort subcommand. The resulting command in this case is: `samtools sort -O bam -T tmp_ -o output.bam  unsorted.bam`. This is what we'd want to see in order to sort 'unsorted.bam'. So, our tool description looks like it was successful!\n\n\n**The Additional information Tab**\n\nHere you may enter some details about SamTools sort to give more information about the tool's developers and uses. All the fields on this tab are optional.\n\nWhen you've finished, click **Save**.\n \nThat's it! SamTools has been Dockerized, and its sort subcommand can now be executed on the CGC at the touch of a button, either on its own or as part of a workflow. To run SamTools sort we just need to [add TCGA data to a project](doc:add-tcga-data-to-a-project). Then, we can click Run on SamTools sort, input the file, and obtain the results.\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Video tutorial\n[block:embed]\n{\n  \"html\": \"<iframe class=\\\"embedly-embed\\\" src=\\\"//cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FyHaGUeFN1LM%3Ffeature%3Doembed&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DyHaGUeFN1LM&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FyHaGUeFN1LM%2Fhqdefault.jpg&key=02466f963b9b4bb8845a05b53d3235d7&type=text%2Fhtml&schema=youtube\\\" width=\\\"640\\\" height=\\\"480\\\" scrolling=\\\"no\\\" frameborder=\\\"0\\\" allowfullscreen></iframe>\",\n  \"url\": \"https://www.youtube.com/watch?v=yHaGUeFN1LM&feature=youtu.be\",\n  \"title\": \"Worked example of SAMtools Sort\",\n  \"favicon\": \"https://www.youtube.com/favicon.ico\",\n  \"image\": \"https://i.ytimg.com/vi/yHaGUeFN1LM/hqdefault.jpg\"\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>","excerpt":"<a name=\"top\"/></a> See a [video](#section-video-tutorial) of this tutorial below.","slug":"install-and-run-samtools-sort","type":"basic","title":"Worked example of uploading SamTools Sort"}

Worked example of uploading SamTools Sort

<a name="top"/></a> See a [video](#section-video-tutorial) of this tutorial below.

##Introduction The CGC Software Development Kit (Rabix) allows you to add your tools to the CGC and use them to run analyses, as you do with tools that are already publicly available on the CGC. This is done by installing the tools in a Docker container and then describing their behavior on the CGC. The first part of the procedure in deploying your tool to the CGC is to create a [Docker](doc:docker-basics) container and install the tool in it. Once this is done, you need create a snapshot of the container, called an image, and push it to the CGC image registry (cgc-images.sbgenomics.com) or the official Docker image registry, [Docker Hub](http://hub.docker.com). The second part of the procedure is to specify the tool's behavior on the CGC, including its inputs and outputs, runtime requirements, and execution semantics. The specification is entered using the [Tool Editor](doc:the-tool-editor). This allows you to use the tool as an individual application on the CGC or interface it with other tools and create workflows. ##Objective This tutorial demonstrates how to install and describe the sort subcommand of SamTools on the CGC. Specifically, we shall: install the bioinformatics package SamTools into a Docker container, push it to the CGC image hub, and describe the sort subcommand in the tool editor. ##Prerequisites For this tutorial, you will need: 1. A <a href="https://cgc.sbgenomics.com/login/" target="blank">CGC account</a>. 2. One of the following machines: * A Linux computer with Docker installed on it. [Full installation instructions are available here](upload-your-docker-image#section-installing-docker-on-linux). * A Mac with Docker Machine installed on it. [Full installation instructions for OS X are available here](upload-your-docker-image#section-installing-docker-on-os-x). * A Windows computer with Docker for Windows or Docker Toolbox depending on the Windows version [Full installation instructions for Windows are available here](upload-your-docker-image#section-installing-docker-on-windows). [block:callout] { "type": "warning", "title": "On this page:", "body": "[1. Create a project](#section-1-create-a-project)\n[2. Upload SamTools in a Docker image](#section-2-upload-samtools-in-a-docker-image)\n[3. Describe each subcommand tool in the graphical editor](#section-3-describe-each-subcommand-tool-in-the-graphical-editor)" } [/block] ##1. Create a project 1. Log in to your CGC account, and click **Create a project** in the main navigation bar. 2. Name the project (e.g. 'SamTools', you can always delete this project when you've finished the tutorial.) [block:image] { "images": [ { "image": [ "https://files.readme.io/00157d9-cgc-create-a-project.jpg", "cgc-create-a-project.jpg", 476, 384, "#ebeff1" ], "border": true } ] } [/block] <div align="right"><a href="#top">top</a></div> ##2. Upload SamTools in a Docker image We will first use [Docker image](doc:upload-your-docker-image) to create an image containing Samtools. We'll start with an Ubuntu base image, install SamTools, then commit and push the image to the [CGC image registry](the-cgc-image-registry), and push it to upload the tool to the CGC. This is illustrated in the example below. The username used in the example is `rfranklin`, the developer project name is `samtools` and the image tag is `v1`. [block:callout] { "type": "danger", "body": "In the example, most of the steps taken are in order to get the tools needed to compile SamTools. These details will vary for different tools. As such, the example here should not be taken as a template for all command line tools." } [/block] To create a Docker image: [block:callout] { "type": "success", "title": "Uploading Docker images", "body": "If you haven't already seen it, take a look at the [documentation on uploading Docker images](doc:upload-your-docker-image)." } [/block] 1. Open up a terminal to get started. [block:callout] { "type": "info", "body": "* **Docker on Mac OS 10.10.3 Yosemite or newer** run Docker for Mac and start a terminal of your choice.\n* **Docker on Mac OS 10.8 Mountain Lion or newer** run Docker Machine, by opening Docker Quickstart terminal or by using the command docker-machine start default.\n* **Windows 7** or **8**: run Docker Quickstart Terminal.\n* **Windows 10**: run Docker for Windows and start a terminal of your choice.\n* **Linux**: skip this step.", "title": "Depending on your operating system, first make sure that Docker is started:" } [/block] 2. To install SAMtools in a Docker container, we will enter the following commands: 2.1 Log in to the CGC image registry (cgc-images.sbgenomics.com) from the terminal: [block:callout] { "type": "danger", "body": "Note that you should enter your **[authentication token](get-your-authentication-token)** in response to the password prompt, <span style=\"color:red\"><b>not your CGC password</b></span>.", "title": "Docker login" } [/block] [block:code] { "codes": [ { "code": "$ docker login cgc-images.sbgenomics.com # You should enter your authentication token in response to the password prompt, not your CGC password.\nUsername: rfranklin\nPassword:\nEmail: rfranklin@sbgenomics.com", "language": "shell", "name": "Installing Samtools in an Ubuntu container" } ] } [/block] 2.2 Load up a container from the Ubuntu base image, update the packages inside the container and install SAMTools: [block:code] { "codes": [ { "code": "$ docker run -ti ubuntu # Load up a container with the ubuntu base image and run bash inside\nCreating container from image ubuntu\nroot@container$ apt-get update # Update the package index inside the container\nroot@container$ apt-get install wget build-essential zlib1g-dev libncurses5-dev # Install the tools we need to compile SamTools\nroot@container$ wget https://github.com/samtools/samtools/releases/download/1.2/samtools-1.2.tar.bz2 # Download the Samtools source code\nroot@container$ tar jxf samtools-1.2.tar.bz2 # Unpack the archive\nroot@container$ cd samtools-1.2 # Go into the directory containing the unpacked Samtools source code\nroot@container$ make # Compile the code\nroot@container$ make install # Install the resulting binaries\nroot@container$ samtools --version # Check that SamTools has installed\nroot@container$ exit ", "language": "shell" } ] } [/block] [block:callout] { "type": "info", "body": "You can choose any Docker base image for your tool." } [/block] A Docker container is a running instance of a Docker image. Once you have instantiated a container from an image using the docker run command, the initial part of the command line will change to something in the form `root@container`, e.g. `root@afa7af5b5d8b`. The `root` part denotes that you are the root user within the container, while the part after the '@' symbol is the ID of the container. Once you have exited the container, copy the container ID as you will need it to perform the next step. 2.3 Commit the container to the image: [block:code] { "codes": [ { "code": "$ docker commit 19d574537671 cgc-images.sbgenomics.com/rfranklin/samtools:v1 # Grab the container ID '19d574537671' from the command prompt inside the container you just exited, to commit its image\n7f7f2b36bffd5dae5d8e4c699079aa96379f5075ce175fb4abd0197a46ebfcd3", "language": "shell" } ] } [/block] The repository name used in this example is `rfranklin/samtools`, following the `<user_name>/<project_name>` pattern. Please note that the allowed characters for repository names are lowercase and uppercase letters, numbers 0 to 9, dash (`-`) and underscore (`_`). Learn more about [repositories in the CGC image registry](the-cgc-image-registry#section-repositories-in-the-cgc-image-registry). 2.4 Push the image to the CGC image registry: [block:code] { "codes": [ { "code": "$ docker push cgc-images.sbgenomics.com/rfranklin/samtools:v1\nThe push refers to a repository [cgc-images.sbgenomics.com/hodesdon/new] (len: 1)\n...\nlatest: digest: sha256:d2304a53961b9e8215805448d0738a0174b3b18ee6ea6145bf1d0062d615ae1a size: 8039", "language": "shell" } ] } [/block] <div align="right"><a href="#top">top</a></div> ##3. Describe each subcommand tool in the graphical editor We have created a Docker image with SamTools inside, and pushed it to the CGC image repository. To use SamTools on the CGC, we still need to capture its interface with the tool editor, so that it can be integrated with other CGC tools. The tool editor treats each subcommand of a command line tool as a distinct tool. So, in this example, we will just describe the SamTools subcommand `sort`. 1. The tool editor can be accessed from inside the project that you created on the CGC. Go to the dashboard for the project, and click '**Create**' on the panel marked '**Apps**'. This brings up a drop-down box. Select Command line tool to describe a new tool. [block:image] { "images": [ { "image": [ "https://files.readme.io/ykW9wpXRT7OWCDETX1T5_Screenshot%202015-11-11%2014.20.55.png", "Screenshot 2015-11-11 14.20.55.png", "1104", "108", "#4f71af", "" ] } ] } [/block] [block:image] { "images": [ { "image": [ "https://files.readme.io/U8QRKCcKSGyomU5TcsYT_Screenshot%202015-11-11%2014.21.35.png", "Screenshot 2015-11-11 14.21.35.png", "844", "608", "#4a775e", "" ] } ] } [/block] 2. To add the tool, first give it a name. Let's name the SamTools `sort` subcommand 'SamTools-sort': [block:image] { "images": [ { "image": [ "https://files.readme.io/9UDS2vzdSk68ZgmHLg0J_Screen%20Shot%202016-03-18%20at%2011.29.53.png", "Screen Shot 2016-03-18 at 11.29.53.png", "784", "616", "#39674d", "" ] } ] } [/block] 3. Once you have given your tool a name, and clicked **Create**, [the graphical tool editor interface](doc:the-tool-editor) will open. This contains fields with which **you can characterize the format of the subcommand as it is executed on the command line**. Since we're going to characterize the `sort` subcommand, let's take another look at its usage. We can use `sort` by entering `samtools sort` on the command line, inside the Docker container where we installed SamTools. Since we exited the container in which we installed SamTools, we need to open it up again so that we can query the usage of SamTools sort. Do this using the Rabix CLI with the command `cgc docker-run <image>` where `< image>` is either the `image ID` or `<repository>/<tag>` . [block:code] { "codes": [ { "code": "$ cgc docker-run cgc-images.sbgenomics.com/rfranklin/samtools:v1 # Open the container with SamTools in it\nCreating container from image cgc-images.sbgenomics.com/rfranklin/samtools:v1\n$ samtools sort # Now let's check the usage of the sort subcommand\nUsage: samtools sort [options...] [in.bam]\nOptions:\n -l INT Set compression level, from 0 (uncompressed) to 9 (best)\n -m INT Set maximum memory per thread; suffix K/M/G recognized [768M]\n -n Sort by read name\n -o FILE Write final output to FILE rather than standard output\n -O FORMAT Write output as FORMAT ('sam'/'bam'/'cram') (either -O or\n -T PREFIX Write temporary files to PREFIX.nnnn.bam -T is required)\n -@ INT Set number of sorting and compression threads [1]", "language": "shell" } ] } [/block] We can see that the sort subcommand takes the options listed above, followed by an input file, `in.bam`. Let's suppose we want sort to output a file named 'output.bam'. In this case, given the usage of `sort` we need to use the following options: * The default behavior of the subcommand is to write the sorted output file to standard output. However, this default behavior can be overridden to instead produce output as a file named 'output.bam', using the option `-o output.bam`. * Since we want to output a BAM file, we need to specify the file format as well as the filename. We do this with the option `-O bam`. * The `-T` option says it is required. This fixes the prefix of the temporary files. Let's prefix our temporary files with `tmp_ `. We do this with the option ` -T tmp_ `. To achieve this behavior for an input file named 'unsorted.bam', we would need to execute the following command: `samtools sort -O bam -T tmp_ -o output.bam unsorted.bam` We will describe our required behavior in the tool editor by entering the information specific below: [block:callout] { "type": "info", "body": "See the [documentation on the Tool Editor](doc:the-tool-editor) for more information on the fields below." } [/block] Within an application, there are five tabs, **General**, **Inputs**, **Outputs**, **Additional Information**, and **Test**. We will walk through the steps required to describe a tool in each of these below. **General Tab** **Docker Repository[:Tag]:** This is the[ location of the image](the-cgc-image-registry#section-repositories-in-the-cgc-image-registry) containing the command. Its format is `images.sbgenomics.com/<repository><:tag>`. In our example we would enter: `cgc-images.sbgenomics.com/rfranklin/samtools:v1` **CPU:** We'll leave this with the default of `1`. **Memory:** Set the value to **5000 MB** of RAM. This amount of memory is needed to process the BAM file that will be used as the input. **Base Command:** The base command is the part of the command that precedes any arguments; in other words, it is the command and subcommand, if there is one. In our example, samtools-sort, we enter `samtools sort` into this field. The editor splits base commands on spaces, so this entry will split into a field containing `samtools` and a field containing `sort`. [block:image] { "images": [ { "image": [ "https://files.readme.io/P3BVYSSK3dvtEZ45IjA1_Screenshot%202015-11-11%2015.12.29.png", "Screenshot 2015-11-11 15.12.29.png", "1250", "342", "#d0d0d0", "" ], "caption": "Under Command, enter 'samtools sort' and the GUI will break this into the base command 'samtools' and the subcommand 'sort'." } ] } [/block] **Stdin, Stdout:** We can leave these blank. We decided to pipe output to a file instead of standard output. **Success code and Temporary fail code**: Set these to `0` and `1` respectively. This is standard behavior. **Arguments: **We want SamTools sort to output a BAM file named 'output.bam'. As described above, we can make it do this using the following code: `samtools sort -O bam -T tmp_ -o output.bam unsorted.bam` We can enter the arguments for this command in the Arguments field of the tool editor, as follows: [block:image] { "images": [ { "image": [ "https://files.readme.io/iVxCEJSSDqFKi3eoUJul_Screenshot%202015-11-11%2015.10.30.png", "Screenshot 2015-11-11 15.10.30.png", "1246", "272", "#406b7d", "" ], "caption": "Click on the '+' to being entering arguments for the command line." } ] } [/block] [block:image] { "images": [ { "image": [ "https://files.readme.io/w6QbFEABTgmgTavtudYM_Screenshot%202015-11-11%2015.10.40.png", "Screenshot 2015-11-11 15.10.40.png", "1788", "584", "#54708a", "" ], "caption": "A dialog window will pop up with default fields." } ] } [/block] **Argument 1:** 1. **Value:** `bam` 2. **Prefix:** `-O` 3. **Separate prefix with:** `space` 4. **Position:** `1` [block:image] { "images": [ { "caption": "Enter the appropriate values and Save.", "image": [ "https://files.readme.io/i5vfZDCGQDqoNFjh5xvX_Screenshot%202015-11-11%2015.17.05.png", "Screenshot 2015-11-11 15.17.05.png", "1782", "572", "#536f88", "" ] } ] } [/block] **Argument 2:** 1. **Value:** `tmp_` 2. **Prefix:** `-T` 3. **Separate prefix with:** `space` 4. **Position:** `2` **Argument 3:** 1. **Value:** `output.bam` 2. **Prefix:** `-o` 3. **Separate prefix with:** `space` 4. **Position:** `3` These resulting Arguments settings are shown in the following screenshot: [block:image] { "images": [ { "image": [ "https://files.readme.io/3TTKmbxTCm74IkkYzwOd_Screen%20Shot%202016-03-18%20at%2011.52.00.png", "Screen Shot 2016-03-18 at 11.52.00.png", "1316", "594", "#4e5e6d", "" ], "caption": "Once done, this is what you should see in the Arguments section." } ] } [/block] When you have finished, the General Information tab should look like this: [block:image] { "images": [ { "image": [ "https://files.readme.io/7f7f3a8-samtools-tutorial-generla-tab.png", "samtools-tutorial-generla-tab.png", 1589, 826, "#e9ecee" ], "caption": "" } ] } [/block] **The Inputs Tab** 1. Click the **+** button to add an input port. 2. Set the **ID** of the input to 'BAM' to label it as the port where BAM files are inputted to the tool. Set its **Type** to 'File'. 3. Enter a **Label** to be displayed on graphical interfaces: we went with 'Bam files input'. Enter a description as well, if you like. 4. There are no **Secondary files** so we can leave this blank. 5. Check the box marked **Include in command line** to enable command line binding. This indicates that when the file is executed on the command line, file inputs are entered into the terminal on the command line. 6. Under the checkbox to **Include in command line**, we enter the details of how inputs are entered to the command line. a. Leave the **Value** field: we'll enter files directly on the command line. b. Leave the **Prefix** field empty as well. This indicates that files are entered to the terminal with no preceding option to indicate the input (although there may be preceding options to control other aspects of the command). c. Set the **Position** to 4, to indicate that the input file comes after the `-O bam` option, whose position we set to 1, the `-T tmp_` option, whose position we set to 2, and the `-o output.bam` option, whose position we set to 3. d. This setting indicates that for an input file named input.bam passed to the command line tool, the full command will have the form: `samtools sort -O bam -T tmp_ -o output.bam input.bam` [block:image] { "images": [ { "image": [ "https://files.readme.io/BFTKdd6QWKtK37VDYcKV_Screen%20Shot%202016-06-24%20at%201.34.23%20PM.png", "Screen Shot 2016-06-24 at 1.34.23 PM.png", "1342", "1326", "#d3554f", "" ] } ] } [/block] **The Outputs Tab** 1. Click the **+** button to add an output port. 2. Set the **ID** to name your output port. We'll name this one 'sorted'. 3. Set the **Type** to 'File' for you sorted files. 4. Set the **glob field** to '*.bam'. This use of globbging will pattern-match any file that ends with '.bam' and report is as the output of the tool. 5. Enter a **label** for the port, which will be used on any visual interface the tool appears in, such as the workflow editor. 6. Specify the **File Types** that the port produces, in this case BAM files. 7. In this example, we haven't annotated the output files with metadata or included secondary files (like index files) so we can leave the rest of the fields blank. **The Test Tab** 1. Fill in some dummy input values of the kind you would enter as command line arguments to the tool. Then you can inspect the resulting command line, at the bottom of the tab. 2. The tool has a single input port for files, so we can enter a BAM file name, 'unsorted.bam' as a dummy input for this port. Notice that the command line output at the bottom of the screen changes to show the command that would be executed on the command line if a BAM file named 'unsorted.bam' were stipulated as the input file for the SamTools sort subcommand. The resulting command in this case is: `samtools sort -O bam -T tmp_ -o output.bam unsorted.bam`. This is what we'd want to see in order to sort 'unsorted.bam'. So, our tool description looks like it was successful! **The Additional information Tab** Here you may enter some details about SamTools sort to give more information about the tool's developers and uses. All the fields on this tab are optional. When you've finished, click **Save**. That's it! SamTools has been Dockerized, and its sort subcommand can now be executed on the CGC at the touch of a button, either on its own or as part of a workflow. To run SamTools sort we just need to [add TCGA data to a project](doc:add-tcga-data-to-a-project). Then, we can click Run on SamTools sort, input the file, and obtain the results. <div align="right"><a href="#top">top</a></div> ##Video tutorial [block:embed] { "html": "<iframe class=\"embedly-embed\" src=\"//cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FyHaGUeFN1LM%3Ffeature%3Doembed&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DyHaGUeFN1LM&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FyHaGUeFN1LM%2Fhqdefault.jpg&key=02466f963b9b4bb8845a05b53d3235d7&type=text%2Fhtml&schema=youtube\" width=\"640\" height=\"480\" scrolling=\"no\" frameborder=\"0\" allowfullscreen></iframe>", "url": "https://www.youtube.com/watch?v=yHaGUeFN1LM&feature=youtu.be", "title": "Worked example of SAMtools Sort", "favicon": "https://www.youtube.com/favicon.ico", "image": "https://i.ytimg.com/vi/yHaGUeFN1LM/hqdefault.jpg" } [/block] <div align="right"><a href="#top">top</a></div>