{"_id":"578e5c9b2be36c0e00f1fac9","project":"55faf11ba62ba1170021a9a7","__v":0,"user":{"_id":"5613e4f8fdd08f2b00437620","username":"","name":"Emile Young"},"initVersion":{"_id":"55faf11ba62ba1170021a9aa","version":"1.0"},"createdAt":"2016-07-19T17:00:11.122Z","changelog":[],"body":"[block:callout]\n{\n  \"type\": \"warning\",\n  \"body\": \"* [Hints format](#section-hints-format)\\n* [CPU and memory requirements](#section-cpu-and-memory-requirements)\\n* [Instance type and the number of parallel instances](#section-instance-type-and-the-number-of-parallel-instances)\\n* [Configure log files for your tool](#section-configure-log-files-for-your-tool)\",\n  \"title\": \"On this page:\"\n}\n[/block]\nThis blog post will provide an overview of how hints can be used to customize tasks on the CGC. In the context of the CGC, \"hints\" are parameters that are entered in the visual interface and allow you to configure or override some of the predefined ways in which apps and tasks work. Specifically, hints can be used to:\n* Set up CPU and memory requirements for a tool's execution.\n* Explicitly state the instance on which you want a tool to run and set the maximum number of parallel instances.\n* Define which files produced by a tool will be caught as log files.\n\n##Hints format\nHints are entered as key - value pairs. Available keys for **task-related hints** are:\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Hint\",\n    \"h-1\": \"Datatype of value\",\n    \"h-2\": \"Description\",\n    \"0-0\": \"`sbg:AWSInstanceType\",\n    \"0-1\": \"string\",\n    \"0-2\": \"Defines which specific AWS  instance is needed to run the task.\\n[Read more](#section-instance-type-and-the-number-of-parallel-instances).\",\n    \"1-0\": \"`sbg:maxNumberOfParallelInstances`\",\n    \"1-1\": \"integer\",\n    \"1-2\": \"Defines how many instances can be run in parallel.\\n[Read more](#section-instance-type-and-the-number-of-parallel-instances).\"\n  },\n  \"cols\": 3,\n  \"rows\": 2\n}\n[/block]\nWhile the available keys for tool-related hints are:\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Hint\",\n    \"h-1\": \"Datatype of value\",\n    \"h-2\": \"Description\",\n    \"0-0\": \"`sbg:CPURequirement`\",\n    \"0-1\": \"integer\",\n    \"0-2\": \"Defines the number of CPUs required to run a tool.\\n[Read more](#section-cpu-and-memory-requirements).\",\n    \"1-0\": \"`sbg:MemRequirement`\",\n    \"1-1\": \"integer\",\n    \"1-2\": \"Specifies the memory needed for execution of a tool.\\n[Read more](#section-cpu-and-memory-requirements).\",\n    \"2-0\": \"`sbg:SaveLogs`\",\n    \"2-1\": \"string (regular expression)\",\n    \"2-2\": \"Uses a regular expression for pattern matching to set which files will be caught as log files.\\n[Read more](#section-configure-log-files-for-your-tool).\"\n  },\n  \"cols\": 3,\n  \"rows\": 3\n}\n[/block]\nFor example, if you want to define the maximum number of instances which can run in parallel, the hint would be:\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Key\",\n    \"h-1\": \"Value\",\n    \"0-0\": \"`sbg:maxNumberOfParallelInstances`\",\n    \"0-1\": \"4\"\n  },\n  \"cols\": 2,\n  \"rows\": 1\n}\n[/block]\nIf the hint depends on another factor, it can be entered as a dynamic expression which will be evaluated at runtime. Dynamic expressions can be entered in any field where you see the symbol </>. For more information, please refer to documentation about [dynamic expressions in tool descriptions](doc:dynamic-expressions-in-tool-descriptions).\n\n##CPU and memory requirements\n\nYou can set the required CPU and memory for any tool, which is done by one of CGC's bioinformaticians in case of a publicly available tool or by yourself if this is a tool that you wrapped for use on the Platform. When a tool is first added to the Platform, the default values in the **CPU** and **Memory (MB)** fields are **1 CPU** and **1000 MB** of memory, but they should be checked and adjusted to match the tool's needs.\n\nCPU and memory requirements are automatically reflected as the following hints in the tool's settings:\n* `sbg:CPURequirement`\n* `sbg:MemRequirement`\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://www.filepicker.io/api/file/65R67tWrRXK815CDhteR\",\n        \"Hints-Add-CPU-Mem-3.jpg\",\n        \"651\",\n        \"421\",\n        \"#d9912c\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"`sbg:CPURequirement` and `sbg:MemRequirement` hints are read-only. CPU and memory parameters can only be modified in the **CPU** and **Memory (MB)** fields on the **General** tab in [the tool editor](doc:the-tool-editor).\"\n}\n[/block]\n##Instance type and the number of parallel instances\n\nEach tool that is run in a task is executed on a computation instance in the cloud. Instances are virtual computers; different instance types have different allocations of CPU and memory, so are suited for workloads with different computational requirements.\n\nThe CGC uses a [scheduling algorithm](http://docs.cancergenomicscloud.org/v1.0/page/multi-instance-scheduling-algorithm) to select an appropriate computation instance for each tool that is run in a task. The scheduling algorithm assigns an instance that has sufficient resources to run the tool and is also optimized to efficiently pack tools onto instances when running workflows made of multiple tools. Even though the scheduling algorithm will select a default instance that is suitable for your task, in some cases you might want to override the algorithm to select a specific instance type to run the task on. This is done using instance-related hints which are listed below:\n* `sbg:AWSInstanceType` - Allows you to define the specific instance you would like to use. When you start typing the instance name in the **Requirement value** field, you will see automatically generated suggestions in the drop-down box.\n* `sbg:maxNumberOfParallelInstances` - Takes an integer value that defines how many instances can be run in parallel for a workflow.\n\nThe following image shows how instance type and number of parallel instances are set up for a workflow using hints.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://www.filepicker.io/api/file/4XBSJCkBQnGdtg1q3aq0\",\n        \"Hints-Add-CPU-Mem-4.jpg\",\n        \"648\",\n        \"378\",\n        \"#e49324\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nYou can override the scheduling algorithm in a number of ways:\n* You can [set the instance type for an entire workflow](doc:set-computation-instances#section-set-the-instance-type-for-a-workflow). This will override any setting that you have made for any given tool in the workflow.\n* You can [set the instance type for any tool](doc:set-computation-instances#section-set-the-instance-type-for-a-tool)(either one you have added to the CGC yourself, using the SDK, or a public tool) using the tool editor. This will override the instance type selected by the scheduler.\n* You can [set the instance type for any tool(s) in a workflow](doc:set-computation-instances#section-set-the-instance-type-for-a-tool-in-a-workflow). This will override any setting you have made in the tool editor.\n\nRead more about [setting up computation instances](doc:set-computation-instances) and see the list of available instance types.\n\nIf you override the instance type that the scheduling algorithm selects, you may inadvertently select one that doesn't have enough resources to run the app successfully. To make sure you pick a suitable instance, you need to check the required resources of the tool you want to use by opening the tool's properties in the [the tool editor](doc:the-tool-editor).\n\n##Configure log files for your tool\nHints can also be used when defining which files produced by a tool will be treated as log files, and presented as the logs of a task on the [view task logs](doc:view-task-logs) page and via the API request to [get task execution details](doc:get-task-execution-details). The default filter for catching the log files finds all files in the working directory of the job that match **\\*.log**, (including err.log, and cmd.log). The visual interface [view task logs](doc:view-task-logs) page additionally shows job.json and cwl.output.json). You can add to the files presented as logs by defining a custom file extension or file name, that will catch all matching files and present them as logs. The following hint is used for this purpose:\n* `sbg:SaveLogs` - Allows you to define a file name or file extension for the log files.\n\nThe image below shows an example of `sbg:SaveLogs` defined for a tool. The configuration shown will save **reports.txt** and all files with the **.err** extension as log files, in addition to the default logs.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://www.filepicker.io/api/file/bPBAu5aFQzi9MbocO1MO\",\n        \"Hints-Add-CPU-Mem-5.jpg\",\n        \"649\",\n        \"507\",\n        \"#d79133\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nYou are able to add as many `sbg:SaveLogs` hints as you like in order to define different log file names and/or extensions.\nSee the steps on how to [configure log files for your tool](doc:advanced-features-of-the-tool-editor#section-configure-the-log-files-for-your-tool).","slug":"customizing-your-tasks-on-the-cgc","title":"Customizing your tasks on the CGC"}

Customizing your tasks on the CGC


[block:callout] { "type": "warning", "body": "* [Hints format](#section-hints-format)\n* [CPU and memory requirements](#section-cpu-and-memory-requirements)\n* [Instance type and the number of parallel instances](#section-instance-type-and-the-number-of-parallel-instances)\n* [Configure log files for your tool](#section-configure-log-files-for-your-tool)", "title": "On this page:" } [/block] This blog post will provide an overview of how hints can be used to customize tasks on the CGC. In the context of the CGC, "hints" are parameters that are entered in the visual interface and allow you to configure or override some of the predefined ways in which apps and tasks work. Specifically, hints can be used to: * Set up CPU and memory requirements for a tool's execution. * Explicitly state the instance on which you want a tool to run and set the maximum number of parallel instances. * Define which files produced by a tool will be caught as log files. ##Hints format Hints are entered as key - value pairs. Available keys for **task-related hints** are: [block:parameters] { "data": { "h-0": "Hint", "h-1": "Datatype of value", "h-2": "Description", "0-0": "`sbg:AWSInstanceType", "0-1": "string", "0-2": "Defines which specific AWS instance is needed to run the task.\n[Read more](#section-instance-type-and-the-number-of-parallel-instances).", "1-0": "`sbg:maxNumberOfParallelInstances`", "1-1": "integer", "1-2": "Defines how many instances can be run in parallel.\n[Read more](#section-instance-type-and-the-number-of-parallel-instances)." }, "cols": 3, "rows": 2 } [/block] While the available keys for tool-related hints are: [block:parameters] { "data": { "h-0": "Hint", "h-1": "Datatype of value", "h-2": "Description", "0-0": "`sbg:CPURequirement`", "0-1": "integer", "0-2": "Defines the number of CPUs required to run a tool.\n[Read more](#section-cpu-and-memory-requirements).", "1-0": "`sbg:MemRequirement`", "1-1": "integer", "1-2": "Specifies the memory needed for execution of a tool.\n[Read more](#section-cpu-and-memory-requirements).", "2-0": "`sbg:SaveLogs`", "2-1": "string (regular expression)", "2-2": "Uses a regular expression for pattern matching to set which files will be caught as log files.\n[Read more](#section-configure-log-files-for-your-tool)." }, "cols": 3, "rows": 3 } [/block] For example, if you want to define the maximum number of instances which can run in parallel, the hint would be: [block:parameters] { "data": { "h-0": "Key", "h-1": "Value", "0-0": "`sbg:maxNumberOfParallelInstances`", "0-1": "4" }, "cols": 2, "rows": 1 } [/block] If the hint depends on another factor, it can be entered as a dynamic expression which will be evaluated at runtime. Dynamic expressions can be entered in any field where you see the symbol </>. For more information, please refer to documentation about [dynamic expressions in tool descriptions](doc:dynamic-expressions-in-tool-descriptions). ##CPU and memory requirements You can set the required CPU and memory for any tool, which is done by one of CGC's bioinformaticians in case of a publicly available tool or by yourself if this is a tool that you wrapped for use on the Platform. When a tool is first added to the Platform, the default values in the **CPU** and **Memory (MB)** fields are **1 CPU** and **1000 MB** of memory, but they should be checked and adjusted to match the tool's needs. CPU and memory requirements are automatically reflected as the following hints in the tool's settings: * `sbg:CPURequirement` * `sbg:MemRequirement` [block:image] { "images": [ { "image": [ "https://www.filepicker.io/api/file/65R67tWrRXK815CDhteR", "Hints-Add-CPU-Mem-3.jpg", "651", "421", "#d9912c", "" ] } ] } [/block] [block:callout] { "type": "info", "body": "`sbg:CPURequirement` and `sbg:MemRequirement` hints are read-only. CPU and memory parameters can only be modified in the **CPU** and **Memory (MB)** fields on the **General** tab in [the tool editor](doc:the-tool-editor)." } [/block] ##Instance type and the number of parallel instances Each tool that is run in a task is executed on a computation instance in the cloud. Instances are virtual computers; different instance types have different allocations of CPU and memory, so are suited for workloads with different computational requirements. The CGC uses a [scheduling algorithm](http://docs.cancergenomicscloud.org/v1.0/page/multi-instance-scheduling-algorithm) to select an appropriate computation instance for each tool that is run in a task. The scheduling algorithm assigns an instance that has sufficient resources to run the tool and is also optimized to efficiently pack tools onto instances when running workflows made of multiple tools. Even though the scheduling algorithm will select a default instance that is suitable for your task, in some cases you might want to override the algorithm to select a specific instance type to run the task on. This is done using instance-related hints which are listed below: * `sbg:AWSInstanceType` - Allows you to define the specific instance you would like to use. When you start typing the instance name in the **Requirement value** field, you will see automatically generated suggestions in the drop-down box. * `sbg:maxNumberOfParallelInstances` - Takes an integer value that defines how many instances can be run in parallel for a workflow. The following image shows how instance type and number of parallel instances are set up for a workflow using hints. [block:image] { "images": [ { "image": [ "https://www.filepicker.io/api/file/4XBSJCkBQnGdtg1q3aq0", "Hints-Add-CPU-Mem-4.jpg", "648", "378", "#e49324", "" ] } ] } [/block] You can override the scheduling algorithm in a number of ways: * You can [set the instance type for an entire workflow](doc:set-computation-instances#section-set-the-instance-type-for-a-workflow). This will override any setting that you have made for any given tool in the workflow. * You can [set the instance type for any tool](doc:set-computation-instances#section-set-the-instance-type-for-a-tool)(either one you have added to the CGC yourself, using the SDK, or a public tool) using the tool editor. This will override the instance type selected by the scheduler. * You can [set the instance type for any tool(s) in a workflow](doc:set-computation-instances#section-set-the-instance-type-for-a-tool-in-a-workflow). This will override any setting you have made in the tool editor. Read more about [setting up computation instances](doc:set-computation-instances) and see the list of available instance types. If you override the instance type that the scheduling algorithm selects, you may inadvertently select one that doesn't have enough resources to run the app successfully. To make sure you pick a suitable instance, you need to check the required resources of the tool you want to use by opening the tool's properties in the [the tool editor](doc:the-tool-editor). ##Configure log files for your tool Hints can also be used when defining which files produced by a tool will be treated as log files, and presented as the logs of a task on the [view task logs](doc:view-task-logs) page and via the API request to [get task execution details](doc:get-task-execution-details). The default filter for catching the log files finds all files in the working directory of the job that match **\*.log**, (including err.log, and cmd.log). The visual interface [view task logs](doc:view-task-logs) page additionally shows job.json and cwl.output.json). You can add to the files presented as logs by defining a custom file extension or file name, that will catch all matching files and present them as logs. The following hint is used for this purpose: * `sbg:SaveLogs` - Allows you to define a file name or file extension for the log files. The image below shows an example of `sbg:SaveLogs` defined for a tool. The configuration shown will save **reports.txt** and all files with the **.err** extension as log files, in addition to the default logs. [block:image] { "images": [ { "image": [ "https://www.filepicker.io/api/file/bPBAu5aFQzi9MbocO1MO", "Hints-Add-CPU-Mem-5.jpg", "649", "507", "#d79133", "" ] } ] } [/block] You are able to add as many `sbg:SaveLogs` hints as you like in order to define different log file names and/or extensions. See the steps on how to [configure log files for your tool](doc:advanced-features-of-the-tool-editor#section-configure-the-log-files-for-your-tool).