{"_id":"5637f0befbe1c50d008cb087","user":"554340dfb7f4540d00fcef1d","version":{"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","__v":45,"createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77","59a555bccdbd85001bfb1442","5a2a81f688574d001e9934f5","5b080c8d7833b20003ddbb6f","5c222bed4bc358002f21459a","5c22412594a2a5005cc9e919","5c41ae1c33592700190a291e","5c8a525e2ba7b2003f9b153c","5cbf14d58c79c700ef2b502e"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"56fbb83d8f21c817002af880","version":"55faf11ba62ba1170021a9aa","__v":0,"project":"55faf11ba62ba1170021a9a7","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-03-30T11:27:57.862Z","from_sync":false,"order":1,"slug":"tutorials","title":"TUTORIALS"},"parentDoc":null,"__v":202,"githubsync":"","project":"55faf11ba62ba1170021a9a7","metadata":{"title":"","description":"","image":[]},"updates":["57e3f6614d659c0e0006e960"],"next":{"pages":[],"description":""},"createdAt":"2015-11-02T23:24:46.472Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":2,"body":"## Introduction\n\nThe Software Development Kit (SDK) allows you to add your tools to the CGC and use them to run analyses, as you do with tools that are already publicly available on the CGC. This is done by installing the tools in a Docker container and then describing their behavior on the CGC.\n\nThe first part of the procedure of deploying your tool to the CGC is to create a [Docker](doc:docker-basics) container and install the tool in it. Once this is done, you need create a snapshot of the container, called an image, and push it to the CGC image registry or the official Docker image registry, [Docker Hub](http://hub.docker.com/).\n\nThe second part of the procedure is to specify the tool's behavior on the CGC, including its inputs and outputs, runtime requirements, and execution semantics. The specification is entered using the Tool Editor. This allows you to use the tool as an individual application on the CGC or interface it with other tools and create workflows.\n\n## Objective\n\nThis tutorial demonstrates how to wrap and run the `sort` subcommand of SAMtools on the CGC. Specifically, we shall install the bioinformatics package SAMtools into a Docker container, push it to the CGC image registry, describe the `sort` subcommand in the tool editor, and then use the imported `sort` tool in a workflow on the CGC.\n\n## Prerequisites\n\nFor this tutorial, you will need:\n1. An account on the CGC.\n2. A Linux, Mac or Windows computer with [Docker installed](https://docs.sevenbridges.com/docs/install-docker).\n\n## Create a project\n1. Click **Projects** in the top navigation bar and choose **Create a project**.\n2. Name the project **SAMtools** (you can always delete this project when you've finished the tutorial).\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/9f7982a-cgc-worked-example-of-uploading-samtools-sort-1.png\",\n        \"cgc-worked-example-of-uploading-samtools-sort-1.png\",\n        477,\n        481,\n        \"#e2e9eb\"\n      ]\n    }\n  ]\n}\n[/block]\n3. Click **Create**.\nYou have now created a project on the CGC.\n\n## Describe each subcommand in the Tool Editor\n\nWe have created a Docker image with SAMtools inside, and pushed it to the CGC image registry. To use SAMtools on the CGC, we still need to capture its interface with the tool editor, so that it can be integrated with other CGC tools.\n\nThe tool editor treats each subcommand of a command line tool as a distinct tool. In this tutorial, we’ll wrap **samtools sort**, one of the tools in the [samtools suite](http://www.htslib.org/doc/samtools.html). **samtools sort** takes an input BAM-format file containing short DNA sequence reads and sorts it.\n\nTo describe `sort`:\n1. On the CGC, navigate to **Projects** > **SAMtools**.\n2. Click the **Apps** tab.\n3. Click **Add App**.\n4. Open the **Create New App**tab.\n5. Click **Create a Tool**.\n6. Enter the name for your tool as shown below, e.g. **SAMtools-sort**.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/964bfff-cgc-worked-example-of-uploading-samtools-sort-2.png\",\n        \"cgc-worked-example-of-uploading-samtools-sort-2.png\",\n        1029,\n        615,\n        \"#f0f1f1\"\n      ]\n    }\n  ]\n}\n[/block]\n7. Click **Create**.\nThe Tool Editor opens. \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/da46196-sbpla-worked-example-of-uploading-samtools-sort-3.png\",\n        \"sbpla-worked-example-of-uploading-samtools-sort-3.png\",\n        1076,\n        683,\n        \"#d0d1d1\"\n      ]\n    }\n  ]\n}\n[/block]\nLeave it like this as we will need it later to complete the tutorial.\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"\",\n  \"body\": \"If your editor is significantly different than the one you see above, you're using the legacy editor. To complete the tutorial successfully, please click **Switch to the new version** in the banner at the top of the page to change your preferred editor.\"\n}\n[/block]\n\n### Identify the required options for the command line\n\nTo keep the example simple, we will use the default values for most parameters and options, and aim to build a command line that looks like:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"samtools sort -O bam -T tmp_ -o <sorted-bam-file.bam>  <input-bam-file.bam>\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\nThe command breaks down as follows:\n\n* **samtools**: the command.\n* **sort**: the subcommand. Together, the command and subcommand form the_base command_. In order for the tool to run properly, all parts of the base command that are separated by a space need to be entered in separate fields within the**Base command**section in tool editor.\n* **-O bam**: the format of the output file. We are hard-coding this rather than allowing the user to specify it when the tool is run, so this is an_argument_.\n* **-T tmp_**: the prefix to use for the temporary files. Again, we are hard-coding this rather than allowing the user to specify it when the tool is run, so this is an_argument_.\n* **-o <sorted-bam-file.bam>**: the name of the output file to generate. We want to allow the user to specify the name of the output file as an input to the command, so this is an input port. Note that the file that is generated will be an_output port_.\n* **<input-bam-file.bam>**: the BAM file to be sorted. We want to allow the user to specify this file as an input to the command, so this is an input port.\n\nAlternatively, we could use a [dynamic expression](doc:dynamic-expressions-in-tool-descriptions-1) to derive the output file name from the input file name. For example, we could set the output file name to`<input-bam-file>_sorted.bam`where`<input-bam-file>`is the first part of the input file name. In this case, when the tool is run, the user will not need to specify a value for the output filename. So the output filename parameter is no longer an input port (specified when the tool is run) but an argument containing a dynamic expression (either fixed, or derived automatically from other information). We won’t use dynamic expressions in this tutorial, but in the next tutorial, we’ll see how to modify this example to use a [dynamic expression](https://docs.sevenbridges.com/docs/dynamic-expressions-in-tool-descriptions-1).\n\n### Create the Docker image\n\n**Note:** A Docker image in the image repository can be accessed by anyone who knows the path and name. So you should avoid including any sensitive data in the image or [set the repository privacy settings](doc:manage-docker-repositories#section-adjust-privacy-settings) to **Private**.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"Make sure Docker is running\",\n  \"body\": \"**Mac OS 10.10.3 Yosemite or newer**: run Docker Desktop for Mac and start a terminal of your choice. **Mac OS 10.8 Mountain Lion or newer**: use Docker Toolbox. **Windows 10**: run Docker for Windows and start a terminal of your choice. **Windows 7** or **8**: run Docker Quickstart Terminal to start Docker Machine. **Linux**: no action required.\"\n}\n[/block]\nOpen a terminal window on your computer and enter the following command:\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker run -ti ubuntu:16.04\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\n\nThis creates a Docker container from the **ubuntu** base image. Here, we are using a minimal ubuntu base image as that is suitable for **samtools**, but you can start with any image that is suitable for the tools you want to use.\n\nThe terminal prompt changes to **root:::at:::<containerid>** where **<containerid>** is a set of 12 alphanumeric characters that represent the unique id for the Docker container you are creating. Make a note of **<containerid>** as you will need it shortly.\n\nLoad the container with the tools you need. In this case, you need to enter the following commands to download and build **samtools**.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"# Update the package index inside the container\\napt-get update\\n# Install the tools we need to download and compile SamTools\\napt-get install wget build-essential zlib1g-dev libncurses5-dev liblzma-dev libbz2-dev\\n# Download the SAMtools source code (version 1.6 or a later version of you prefer)\\nwget https://github.com/samtools/samtools/releases/download/1.6/samtools-1.6.tar.bz2\\n# Unpack the archive\\ntar jxf samtools-1.6.tar.bz2\\n# Go into the directory containing the unpacked Samtools source code\\ncd samtools-1.6\\n# Compile the code\\nmake\\n# Install the resulting binaries\\nmake install\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\nTest that the **samtools** executable has been installed and built successfully. Enter the following command to verify that the version information for **samtools** is displayed.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"samtools --version\",\n      \"language\": \"bash\"\n    }\n  ]\n}\n[/block]\nIf everything is installed properly, the output should look like this:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"samtools 1.6\\nUsing htslib 1.6\\nCopyright (C) 2017 Genome Research Ltd.\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\nExit the container (remember to make a note of the container id).\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"exit\",\n      \"language\": \"bash\"\n    }\n  ]\n}\n[/block]\n### Save and upload the Docker image\n\nTo save the container as an image in the CGC image registry, first log in to the registry. In the terminal window, enter:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker login cgc-images.sbgenomics.com\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\nWhen prompted for a username, enter your CGC username. When prompted for a password, enter your CGC [authentication token](doc:get-your-authentication-token) not your CGC login password.\n\nYou will see a message saying the login has succeeded, then you will be returned to the terminal prompt. Note that this login times out after a while, so if you don’t access the CGC image registry promptly, you may need to log in again in order to do so.\n\nCommit the image to the repository as follows:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker commit <containerid> cgc-images.sbgenomics.com/<username>/samtools:v1\",\n      \"language\": \"bash\"\n    }\n  ]\n}\n[/block]\nIn this command, `<containerid>` is the ID of the container you made a note of above, `<username>` is your CGC username, modified if necessary to be all in lowercase, and with any hyphens or full stops replaced by underscores. In this example, we have called the image `samtools`, and have tagged it as `v1`. If the commit is successful, you will see a message similar to this:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"sha256:4dcd3c6911776ba0417e322dd40d0d4881e1806f9b3027516888798b21b8203f\",\n      \"language\": \"bash\"\n    }\n  ]\n}\n[/block]\nPush the image to the image registry:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker push cgc-images.sbgenomics.com/<username>/samtools:v1\",\n      \"language\": \"bash\"\n    }\n  ]\n}\n[/block]\nwhere `<username>` is your CGC username, modified if necessary, as above. Please wait for the procedure to complete.\n\nIf the push is successful, you will see several messages, ending with a message similar to this:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"v1: digest: sha256:64a47b5dcdb95a4b6184e880365694b40e1cd85e4151074a11ba1f37c8b56f1f size: 1570\",\n      \"language\": \"bash\"\n    }\n  ]\n}\n[/block]\nIf you want to know more about Docker commands, you will find a list of common Docker commands [here](https://docs.sevenbridges.com/docs/core-docker-commands).\n\n### Specify the Docker image containing the tool\n\nGo back to your **SAMtools-sort** project on the CGC. In the **Docker Image** section of the tool editor, set **Docker repository** to `cgc-images.sbgenomics.com/<username>/samtools:v1` where `<username>` is your CGC username, all in lowercase, and with any hyphens or full stops replaced by underscores.\n\n### Specify the base command\n1. In the **Base Command** section of the tool editor, click **Add Base Command**.\n2. Enter **samtools**.\n3. Under the text field click **+ Add Base Command**.\n4. Enter **sort** in the blank field.\n\nClick **Command Line** at the bottom right to open a preview pane showing a preview of the command we are building up. You should see **samtools sort** in the preview pane.\n\n### Specify the arguments\n\nWe need to specify the output file format as a fixed argument (the string is **-O bam**).\n\nIn the **Arguments** section of the tool editor, click **Add an Argument**. An argument is added and the object inspector opens on the right hand side showing the properties of the argument.\n\nIn the object inspector:\n1. Leave **Use command line binding** selected\n2. Set **Prefix** to **-O**\n3. Set **Value** to **bam**\n4. Leave **Separate value and prefix** selected (the syntax requires a space between the prefix and the expression)\n5. Leave **Position** set to **0** (as long as this argument is after the base command at the beginning of the command line and before the input file at the end of the command line, the actual position relative to the other items on the command line doesn’t matter).\n\nWe also need to specify the temporary file prefix as a fixed argument (the string is **-T tmp_**).\n\nIn the bottom-left part of the **Arguments** section, click **+ Add an Argument**. Then, in the object inspector on the right:\n1. Leave **Use command line binding** selected.\n2. Set **Prefix** to **-T**\n3. Set **Value** to **tmp_**\n4. Leave **Separate Value and Prefix** selected (the syntax requires a space between the prefix and the expression)\n5. Leave **Position** set to **0** (as long as this argument is after the base command at the beginning of the command line and before the input file at the end of the command line, the actual position relative to the other items on the command line doesn’t matter).\n\nIn the preview pane you should see **samtools sort -O bam -T tmp_**.\n\n### Specify the input ports\n\nWe need to specify the name of the output file as an input port (the string is **-o .bam**).\n\nIn the **Input ports** section of the tool editor, click **Add an Input**. An input port is added, with a default name of **input**, and the object inspector opens on the right hand side showing the properties of the input.\n\nIn the object inspector on the right:\n1. Select **Required**. This will be a mandatory input.\n2. Set **ID** to **sorted_file_name**.\n3. Set **Type** to **string**.\n4. Leave** Allow array as well as single item** unselected.\n5. Select **Include in the command line**.\n6. Leave **Value Transform** blank. This is where we could insert a dynamic expression to derive the name as a function of the input file name if we wanted to. Because we have left this blank, the user of the tool will be asked to specify a value when the tool executes.\n7. Set **Prefix** to **-o**.\n8. Leave **Separate value and prefix** selected.\n9. Set **Position** to **0** (as long as this part of the command is after the base command at the beginning of the command line and before the input file at the end of the command line, the actual position relative to the other items on the command line doesn’t matter).\n10. Expand the **Description** drop-down, and set **Label** to **Sorted file name**. Optionally, add a more detailed description of the input port in the **Description** text box. When the tool is placed in a workflow, the value from the **Label** field is displayed against the output port (if not supplied, the ID is used instead).\n\nIn the preview pane, you should see **samtools sort -O bam -T tmp_ -o sorted_file_name-string-value**.\n\nWe also need to specify the input file as an input port. In the bottom-right part of the **Input ports** section, click **+ Add an Input**.\n\nIn the object inspector on the right:\n1. Select **Required**.\n2. Set **ID** to **input_bam_file**.\n3. Set **Type** to **File**.\n4. Leave **Value Transform** blank.\n5. Leave **Prefix** blank.\n6. Leave **Separate Value and Prefix** selected.\n7. Set **Position** to **1** (this must appear in the command line after the other arguments and inputs).\n8. Scroll down to the **Description** section, expand the drop-down, and set **Label** to **Input BAM file**. Optionally, add a more detailed description of the input port in the **Description** text box.\n9. Set **File type(s)** to **BAM** (only valid for CWL v1.0 workflows). When the tool is placed in a workflow, the value from the **Label** field is displayed against the input port (if not supplied, the ID is used instead). For CWL v1.0 tools only,** File type(s)** allows the workflow editor to check that output nodes are connected to input nodes of the correct type.\n\nIn the preview pane, you should see **samtools sort -O bam -T tmp -o sorted_file_name-string-value /path/to/input.ext**.\n\n### Specify the output port\n\nNow we need to specify the output file as an output port. Note that we have already set the name of the output file as an input. But we also need to specify an output port for the file in order to retrieve the output. In the **Output ports** section of the tool editor, click **Add an Output**. An output port is added and the object inspector opens on the right hand side showing the properties of the output.\n\nIn the object inspector:\n1. Select **Required**.\n2. Set **ID** to **sorted_bam_file**.\n3. Set **Type** to **File**.\n4. Set **Glob** to ***.bam**. This means that any file that matches this filter will be reported as an output of the tool. We could use a dynamic expression instead to specify only files that match the specified output file name, but this simpler option will be enough for now.\n5. Scroll down to the **Description** drop-down, expand it, and set **Label** to **Output BAM file**. Optionally, add a more detailed description of the output port in the **Description** text box.\n6. Set **File type(s)** to **BAM** (only valid for CWL v1.0 workflows). When the tool is placed in a workflow, the value from the **Label** field is displayed against the output port (if not supplied, the ID is used instead) and, for CWL V1.0 tools only, **File type(s)** allows the workflow editor to check that output nodes are connected to input nodes of the correct type.\n\n### Save the tool\n1. Click the **Save** icon in the top-right corner to save the tool description.\n2. (Optional) Add a short revision note describing the changes you have made.\n3. Click **Save**.\n\n### Test the tool\n\nWe are going to use a typical BAM file from the 1000 Genomes project to test the tool on the CGC, so first you need to copy it to your project.\n1. On the CGC, select **Data** > **Public Test files**.\n2. Enter **NA12878.ga2.exome.maq.raw.bam** in the search box. Note that the task will take around 75 minutes to execute with this input file. If you want to do only a quick test of the tool, you could use a smaller input file, as this will return a result in a few minutes. If so, search for **G26234.HCC1187_1M.aligned.bam** instead.\n3. Select the file then click **Copy**, and specify the project where your newly-created tool is located.\n4. Navigate back to the project where your **samtools-sort** tool is located.\n5. Click the **Apps** tab.\n6. Click **Run** next to **samtools-sort**.\n7. Next to the **Input BAM file** input port click **Select File(s)**, and select **NA12878.ga2.exome.maq.raw.bam**. If you opted for using the smaller file in step 2 above, select **G26234.HCC1187_1M.aligned.bam** instead.\n8. Click **Save selection**.\n9. In the **Sorted file name** field enter **sorted_bam_file.bam**.\n10. Click **Run**.\n\nThis analysis will take around 75 minutes to run (or less if you are using the smaller input file), and you will receive an email when it is completed.\n\n### View the results\n\nWhen you receive the notification email, click the link in the email to view the results. You should see that the task was successful and that a single output file named **sorted_bam_file.bam** was created,","excerpt":"<a name=\"top\"/></a> See a [video](#section-video-tutorial) of this tutorial below.","slug":"install-and-run-samtools-sort","type":"basic","title":"Worked example of uploading SamTools Sort"}

Worked example of uploading SamTools Sort

<a name="top"/></a> See a [video](#section-video-tutorial) of this tutorial below.

## Introduction The Software Development Kit (SDK) allows you to add your tools to the CGC and use them to run analyses, as you do with tools that are already publicly available on the CGC. This is done by installing the tools in a Docker container and then describing their behavior on the CGC. The first part of the procedure of deploying your tool to the CGC is to create a [Docker](doc:docker-basics) container and install the tool in it. Once this is done, you need create a snapshot of the container, called an image, and push it to the CGC image registry or the official Docker image registry, [Docker Hub](http://hub.docker.com/). The second part of the procedure is to specify the tool's behavior on the CGC, including its inputs and outputs, runtime requirements, and execution semantics. The specification is entered using the Tool Editor. This allows you to use the tool as an individual application on the CGC or interface it with other tools and create workflows. ## Objective This tutorial demonstrates how to wrap and run the `sort` subcommand of SAMtools on the CGC. Specifically, we shall install the bioinformatics package SAMtools into a Docker container, push it to the CGC image registry, describe the `sort` subcommand in the tool editor, and then use the imported `sort` tool in a workflow on the CGC. ## Prerequisites For this tutorial, you will need: 1. An account on the CGC. 2. A Linux, Mac or Windows computer with [Docker installed](https://docs.sevenbridges.com/docs/install-docker). ## Create a project 1. Click **Projects** in the top navigation bar and choose **Create a project**. 2. Name the project **SAMtools** (you can always delete this project when you've finished the tutorial). [block:image] { "images": [ { "image": [ "https://files.readme.io/9f7982a-cgc-worked-example-of-uploading-samtools-sort-1.png", "cgc-worked-example-of-uploading-samtools-sort-1.png", 477, 481, "#e2e9eb" ] } ] } [/block] 3. Click **Create**. You have now created a project on the CGC. ## Describe each subcommand in the Tool Editor We have created a Docker image with SAMtools inside, and pushed it to the CGC image registry. To use SAMtools on the CGC, we still need to capture its interface with the tool editor, so that it can be integrated with other CGC tools. The tool editor treats each subcommand of a command line tool as a distinct tool. In this tutorial, we’ll wrap **samtools sort**, one of the tools in the [samtools suite](http://www.htslib.org/doc/samtools.html). **samtools sort** takes an input BAM-format file containing short DNA sequence reads and sorts it. To describe `sort`: 1. On the CGC, navigate to **Projects** > **SAMtools**. 2. Click the **Apps** tab. 3. Click **Add App**. 4. Open the **Create New App**tab. 5. Click **Create a Tool**. 6. Enter the name for your tool as shown below, e.g. **SAMtools-sort**. [block:image] { "images": [ { "image": [ "https://files.readme.io/964bfff-cgc-worked-example-of-uploading-samtools-sort-2.png", "cgc-worked-example-of-uploading-samtools-sort-2.png", 1029, 615, "#f0f1f1" ] } ] } [/block] 7. Click **Create**. The Tool Editor opens.  [block:image] { "images": [ { "image": [ "https://files.readme.io/da46196-sbpla-worked-example-of-uploading-samtools-sort-3.png", "sbpla-worked-example-of-uploading-samtools-sort-3.png", 1076, 683, "#d0d1d1" ] } ] } [/block] Leave it like this as we will need it later to complete the tutorial. [block:callout] { "type": "info", "title": "", "body": "If your editor is significantly different than the one you see above, you're using the legacy editor. To complete the tutorial successfully, please click **Switch to the new version** in the banner at the top of the page to change your preferred editor." } [/block] ### Identify the required options for the command line To keep the example simple, we will use the default values for most parameters and options, and aim to build a command line that looks like: [block:code] { "codes": [ { "code": "samtools sort -O bam -T tmp_ -o <sorted-bam-file.bam> <input-bam-file.bam>", "language": "shell" } ] } [/block] The command breaks down as follows: * **samtools**: the command. * **sort**: the subcommand. Together, the command and subcommand form the_base command_. In order for the tool to run properly, all parts of the base command that are separated by a space need to be entered in separate fields within the**Base command**section in tool editor. * **-O bam**: the format of the output file. We are hard-coding this rather than allowing the user to specify it when the tool is run, so this is an_argument_. * **-T tmp_**: the prefix to use for the temporary files. Again, we are hard-coding this rather than allowing the user to specify it when the tool is run, so this is an_argument_. * **-o <sorted-bam-file.bam>**: the name of the output file to generate. We want to allow the user to specify the name of the output file as an input to the command, so this is an input port. Note that the file that is generated will be an_output port_. * **<input-bam-file.bam>**: the BAM file to be sorted. We want to allow the user to specify this file as an input to the command, so this is an input port. Alternatively, we could use a [dynamic expression](doc:dynamic-expressions-in-tool-descriptions-1) to derive the output file name from the input file name. For example, we could set the output file name to`<input-bam-file>_sorted.bam`where`<input-bam-file>`is the first part of the input file name. In this case, when the tool is run, the user will not need to specify a value for the output filename. So the output filename parameter is no longer an input port (specified when the tool is run) but an argument containing a dynamic expression (either fixed, or derived automatically from other information). We won’t use dynamic expressions in this tutorial, but in the next tutorial, we’ll see how to modify this example to use a [dynamic expression](https://docs.sevenbridges.com/docs/dynamic-expressions-in-tool-descriptions-1). ### Create the Docker image **Note:** A Docker image in the image repository can be accessed by anyone who knows the path and name. So you should avoid including any sensitive data in the image or [set the repository privacy settings](doc:manage-docker-repositories#section-adjust-privacy-settings) to **Private**. [block:callout] { "type": "info", "title": "Make sure Docker is running", "body": "**Mac OS 10.10.3 Yosemite or newer**: run Docker Desktop for Mac and start a terminal of your choice. **Mac OS 10.8 Mountain Lion or newer**: use Docker Toolbox. **Windows 10**: run Docker for Windows and start a terminal of your choice. **Windows 7** or **8**: run Docker Quickstart Terminal to start Docker Machine. **Linux**: no action required." } [/block] Open a terminal window on your computer and enter the following command: [block:code] { "codes": [ { "code": "docker run -ti ubuntu:16.04", "language": "shell" } ] } [/block] This creates a Docker container from the **ubuntu** base image. Here, we are using a minimal ubuntu base image as that is suitable for **samtools**, but you can start with any image that is suitable for the tools you want to use. The terminal prompt changes to **root@<containerid>** where **<containerid>** is a set of 12 alphanumeric characters that represent the unique id for the Docker container you are creating. Make a note of **<containerid>** as you will need it shortly. Load the container with the tools you need. In this case, you need to enter the following commands to download and build **samtools**. [block:code] { "codes": [ { "code": "# Update the package index inside the container\napt-get update\n# Install the tools we need to download and compile SamTools\napt-get install wget build-essential zlib1g-dev libncurses5-dev liblzma-dev libbz2-dev\n# Download the SAMtools source code (version 1.6 or a later version of you prefer)\nwget https://github.com/samtools/samtools/releases/download/1.6/samtools-1.6.tar.bz2\n# Unpack the archive\ntar jxf samtools-1.6.tar.bz2\n# Go into the directory containing the unpacked Samtools source code\ncd samtools-1.6\n# Compile the code\nmake\n# Install the resulting binaries\nmake install", "language": "shell" } ] } [/block] Test that the **samtools** executable has been installed and built successfully. Enter the following command to verify that the version information for **samtools** is displayed. [block:code] { "codes": [ { "code": "samtools --version", "language": "bash" } ] } [/block] If everything is installed properly, the output should look like this: [block:code] { "codes": [ { "code": "samtools 1.6\nUsing htslib 1.6\nCopyright (C) 2017 Genome Research Ltd.", "language": "shell" } ] } [/block] Exit the container (remember to make a note of the container id). [block:code] { "codes": [ { "code": "exit", "language": "bash" } ] } [/block] ### Save and upload the Docker image To save the container as an image in the CGC image registry, first log in to the registry. In the terminal window, enter: [block:code] { "codes": [ { "code": "docker login cgc-images.sbgenomics.com", "language": "shell" } ] } [/block] When prompted for a username, enter your CGC username. When prompted for a password, enter your CGC [authentication token](doc:get-your-authentication-token) not your CGC login password. You will see a message saying the login has succeeded, then you will be returned to the terminal prompt. Note that this login times out after a while, so if you don’t access the CGC image registry promptly, you may need to log in again in order to do so. Commit the image to the repository as follows: [block:code] { "codes": [ { "code": "docker commit <containerid> cgc-images.sbgenomics.com/<username>/samtools:v1", "language": "bash" } ] } [/block] In this command, `<containerid>` is the ID of the container you made a note of above, `<username>` is your CGC username, modified if necessary to be all in lowercase, and with any hyphens or full stops replaced by underscores. In this example, we have called the image `samtools`, and have tagged it as `v1`. If the commit is successful, you will see a message similar to this: [block:code] { "codes": [ { "code": "sha256:4dcd3c6911776ba0417e322dd40d0d4881e1806f9b3027516888798b21b8203f", "language": "bash" } ] } [/block] Push the image to the image registry: [block:code] { "codes": [ { "code": "docker push cgc-images.sbgenomics.com/<username>/samtools:v1", "language": "bash" } ] } [/block] where `<username>` is your CGC username, modified if necessary, as above. Please wait for the procedure to complete. If the push is successful, you will see several messages, ending with a message similar to this: [block:code] { "codes": [ { "code": "v1: digest: sha256:64a47b5dcdb95a4b6184e880365694b40e1cd85e4151074a11ba1f37c8b56f1f size: 1570", "language": "bash" } ] } [/block] If you want to know more about Docker commands, you will find a list of common Docker commands [here](https://docs.sevenbridges.com/docs/core-docker-commands). ### Specify the Docker image containing the tool Go back to your **SAMtools-sort** project on the CGC. In the **Docker Image** section of the tool editor, set **Docker repository** to `cgc-images.sbgenomics.com/<username>/samtools:v1` where `<username>` is your CGC username, all in lowercase, and with any hyphens or full stops replaced by underscores. ### Specify the base command 1. In the **Base Command** section of the tool editor, click **Add Base Command**. 2. Enter **samtools**. 3. Under the text field click **+ Add Base Command**. 4. Enter **sort** in the blank field. Click **Command Line** at the bottom right to open a preview pane showing a preview of the command we are building up. You should see **samtools sort** in the preview pane. ### Specify the arguments We need to specify the output file format as a fixed argument (the string is **-O bam**). In the **Arguments** section of the tool editor, click **Add an Argument**. An argument is added and the object inspector opens on the right hand side showing the properties of the argument. In the object inspector: 1. Leave **Use command line binding** selected 2. Set **Prefix** to **-O** 3. Set **Value** to **bam** 4. Leave **Separate value and prefix** selected (the syntax requires a space between the prefix and the expression) 5. Leave **Position** set to **0** (as long as this argument is after the base command at the beginning of the command line and before the input file at the end of the command line, the actual position relative to the other items on the command line doesn’t matter). We also need to specify the temporary file prefix as a fixed argument (the string is **-T tmp_**). In the bottom-left part of the **Arguments** section, click **+ Add an Argument**. Then, in the object inspector on the right: 1. Leave **Use command line binding** selected. 2. Set **Prefix** to **-T** 3. Set **Value** to **tmp_** 4. Leave **Separate Value and Prefix** selected (the syntax requires a space between the prefix and the expression) 5. Leave **Position** set to **0** (as long as this argument is after the base command at the beginning of the command line and before the input file at the end of the command line, the actual position relative to the other items on the command line doesn’t matter). In the preview pane you should see **samtools sort -O bam -T tmp_**. ### Specify the input ports We need to specify the name of the output file as an input port (the string is **-o .bam**). In the **Input ports** section of the tool editor, click **Add an Input**. An input port is added, with a default name of **input**, and the object inspector opens on the right hand side showing the properties of the input. In the object inspector on the right: 1. Select **Required**. This will be a mandatory input. 2. Set **ID** to **sorted_file_name**. 3. Set **Type** to **string**. 4. Leave** Allow array as well as single item** unselected. 5. Select **Include in the command line**. 6. Leave **Value Transform** blank. This is where we could insert a dynamic expression to derive the name as a function of the input file name if we wanted to. Because we have left this blank, the user of the tool will be asked to specify a value when the tool executes. 7. Set **Prefix** to **-o**. 8. Leave **Separate value and prefix** selected. 9. Set **Position** to **0** (as long as this part of the command is after the base command at the beginning of the command line and before the input file at the end of the command line, the actual position relative to the other items on the command line doesn’t matter). 10. Expand the **Description** drop-down, and set **Label** to **Sorted file name**. Optionally, add a more detailed description of the input port in the **Description** text box. When the tool is placed in a workflow, the value from the **Label** field is displayed against the output port (if not supplied, the ID is used instead). In the preview pane, you should see **samtools sort -O bam -T tmp_ -o sorted_file_name-string-value**. We also need to specify the input file as an input port. In the bottom-right part of the **Input ports** section, click **+ Add an Input**. In the object inspector on the right: 1. Select **Required**. 2. Set **ID** to **input_bam_file**. 3. Set **Type** to **File**. 4. Leave **Value Transform** blank. 5. Leave **Prefix** blank. 6. Leave **Separate Value and Prefix** selected. 7. Set **Position** to **1** (this must appear in the command line after the other arguments and inputs). 8. Scroll down to the **Description** section, expand the drop-down, and set **Label** to **Input BAM file**. Optionally, add a more detailed description of the input port in the **Description** text box. 9. Set **File type(s)** to **BAM** (only valid for CWL v1.0 workflows). When the tool is placed in a workflow, the value from the **Label** field is displayed against the input port (if not supplied, the ID is used instead). For CWL v1.0 tools only,** File type(s)** allows the workflow editor to check that output nodes are connected to input nodes of the correct type. In the preview pane, you should see **samtools sort -O bam -T tmp -o sorted_file_name-string-value /path/to/input.ext**. ### Specify the output port Now we need to specify the output file as an output port. Note that we have already set the name of the output file as an input. But we also need to specify an output port for the file in order to retrieve the output. In the **Output ports** section of the tool editor, click **Add an Output**. An output port is added and the object inspector opens on the right hand side showing the properties of the output. In the object inspector: 1. Select **Required**. 2. Set **ID** to **sorted_bam_file**. 3. Set **Type** to **File**. 4. Set **Glob** to ***.bam**. This means that any file that matches this filter will be reported as an output of the tool. We could use a dynamic expression instead to specify only files that match the specified output file name, but this simpler option will be enough for now. 5. Scroll down to the **Description** drop-down, expand it, and set **Label** to **Output BAM file**. Optionally, add a more detailed description of the output port in the **Description** text box. 6. Set **File type(s)** to **BAM** (only valid for CWL v1.0 workflows). When the tool is placed in a workflow, the value from the **Label** field is displayed against the output port (if not supplied, the ID is used instead) and, for CWL V1.0 tools only, **File type(s)** allows the workflow editor to check that output nodes are connected to input nodes of the correct type. ### Save the tool 1. Click the **Save** icon in the top-right corner to save the tool description. 2. (Optional) Add a short revision note describing the changes you have made. 3. Click **Save**. ### Test the tool We are going to use a typical BAM file from the 1000 Genomes project to test the tool on the CGC, so first you need to copy it to your project. 1. On the CGC, select **Data** > **Public Test files**. 2. Enter **NA12878.ga2.exome.maq.raw.bam** in the search box. Note that the task will take around 75 minutes to execute with this input file. If you want to do only a quick test of the tool, you could use a smaller input file, as this will return a result in a few minutes. If so, search for **G26234.HCC1187_1M.aligned.bam** instead. 3. Select the file then click **Copy**, and specify the project where your newly-created tool is located. 4. Navigate back to the project where your **samtools-sort** tool is located. 5. Click the **Apps** tab. 6. Click **Run** next to **samtools-sort**. 7. Next to the **Input BAM file** input port click **Select File(s)**, and select **NA12878.ga2.exome.maq.raw.bam**. If you opted for using the smaller file in step 2 above, select **G26234.HCC1187_1M.aligned.bam** instead. 8. Click **Save selection**. 9. In the **Sorted file name** field enter **sorted_bam_file.bam**. 10. Click **Run**. This analysis will take around 75 minutes to run (or less if you are using the smaller input file), and you will receive an email when it is completed. ### View the results When you receive the notification email, click the link in the email to view the results. You should see that the task was successful and that a single output file named **sorted_bam_file.bam** was created,