{"__v":0,"_id":"586d3d68d7f6a12f00c62444","category":{"project":"55faf11ba62ba1170021a9a7","version":"55faf11ba62ba1170021a9aa","_id":"586d3c287c6b5b2300c05055","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2017-01-04T18:17:12.683Z","from_sync":false,"order":18,"slug":"task-execution","title":"Task Execution"},"parentDoc":null,"project":"55faf11ba62ba1170021a9a7","user":"575e85ac41c8ba0e00259a44","version":{"__v":35,"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2017-01-04T18:22:32.979Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":2,"body":"To achieve the parallelization of several executions of the same tool, the Seven Bridges Platform implements a Common Workflow Language (CWL) feature called *scattering*.\n\nScattering is a mechanism that applies to a particular tool and one of its input ports. If a tool is passed a list of inputs on one port and that port is marked as \"scattered\", then one job will be created for each input in the list.\n\nThe scheduling algorithm will have these jobs be run in parallel, as far a the available compute resources allow it. If all jobs cannot be executed in parallel, they will be queued for execution as soon as more resources become available.\n\nScattering on a critical tool in your workflow may shorten the workflow's run time significantly. For an example of how this can be achieved, see this [blog post](blog:making-efficient-use-of-compute-resources#section-when-being-scattered-is-a-very-good-thing-optimising-a-whole-genome-analysis) explaining how a whole genome analysis workflow uses scattering.\n\nNote that scattering is different from [performing batch analyses](doc:perform-batch-analysis). Batching launches multiple tasks, whereas scattering happens within a single task.\n\n##Keeping scattering under control\n\nThe power of scattering to reduce analysis time lies in making full use of the available compute resources. You can control the resources available for the execution of an app by specifying instance type and the number of instances to be used in parallel. \n\nWhile scattering is a powerful tool to shorten your analysis run time, it may well increase the overall cost of your analysis if used in combination with certain other settings.\n\nThere are two ways in which you can fine-tune how the scattering works on a tool:\n  * Configuring computational instances on the tool.\n  * Setting the maximum number of parallel instances;\n\n\n###Controlling via instance type\n\nBased on the scattered [tool's resource requirements](doc:about-tool-resource-requirements), you may want to pick an instance that leaves the least CPU and memory to waste for a given number of scattered jobs and maximum number of parallel instances. [This blog post](blog:making-efficient-use-of-compute-resources#section-choosing-the-best-compute-instance-for-your-analysis) explains how to choose an instance suitable for your analysis.\n\nTo set the instance type, set the [`sbg:AWSInstanceType`](doc:list-of-execution-hints) or [`sbg:GoogleInstanceType`](doc:list-of-execution-hints) hint at [workflow level](doc:set-execution-hints-at-workflow-level).\n\n###Controlling via maximum number of parallel instances\n\nIf you anticipate that the execution of the tool which you are scattering is time-critical for the entire workflow, you can configure the maximum number of instances that the Scheduling Algorithm is allowed to have running at any one time.\n\nIf the jobs that would be started as a result of scattering cannot fit onto the provisioned instances according to their tool's resource requirements, those jobs will be queued for execution. As soon as enough resources become available following the completion of other jobs, queued jobs will be executed. This ensures there will be less idle time across the entire task. \n\nThe CGC bioinformaticians exploit this technique when tuning workflows in Public Apps\n\nTo set the maximum number of instances, set the [`sbg:maxNumberOfParallelInstances`](doc:list-of-execution-hints) hint at [workflow level](doc:set-execution-hints-at-workflow-level).","excerpt":"","slug":"about-parallelizing-tool-executions","type":"basic","title":"About parallelizing tool executions"}

About parallelizing tool executions


To achieve the parallelization of several executions of the same tool, the Seven Bridges Platform implements a Common Workflow Language (CWL) feature called *scattering*. Scattering is a mechanism that applies to a particular tool and one of its input ports. If a tool is passed a list of inputs on one port and that port is marked as "scattered", then one job will be created for each input in the list. The scheduling algorithm will have these jobs be run in parallel, as far a the available compute resources allow it. If all jobs cannot be executed in parallel, they will be queued for execution as soon as more resources become available. Scattering on a critical tool in your workflow may shorten the workflow's run time significantly. For an example of how this can be achieved, see this [blog post](blog:making-efficient-use-of-compute-resources#section-when-being-scattered-is-a-very-good-thing-optimising-a-whole-genome-analysis) explaining how a whole genome analysis workflow uses scattering. Note that scattering is different from [performing batch analyses](doc:perform-batch-analysis). Batching launches multiple tasks, whereas scattering happens within a single task. ##Keeping scattering under control The power of scattering to reduce analysis time lies in making full use of the available compute resources. You can control the resources available for the execution of an app by specifying instance type and the number of instances to be used in parallel. While scattering is a powerful tool to shorten your analysis run time, it may well increase the overall cost of your analysis if used in combination with certain other settings. There are two ways in which you can fine-tune how the scattering works on a tool: * Configuring computational instances on the tool. * Setting the maximum number of parallel instances; ###Controlling via instance type Based on the scattered [tool's resource requirements](doc:about-tool-resource-requirements), you may want to pick an instance that leaves the least CPU and memory to waste for a given number of scattered jobs and maximum number of parallel instances. [This blog post](blog:making-efficient-use-of-compute-resources#section-choosing-the-best-compute-instance-for-your-analysis) explains how to choose an instance suitable for your analysis. To set the instance type, set the [`sbg:AWSInstanceType`](doc:list-of-execution-hints) or [`sbg:GoogleInstanceType`](doc:list-of-execution-hints) hint at [workflow level](doc:set-execution-hints-at-workflow-level). ###Controlling via maximum number of parallel instances If you anticipate that the execution of the tool which you are scattering is time-critical for the entire workflow, you can configure the maximum number of instances that the Scheduling Algorithm is allowed to have running at any one time. If the jobs that would be started as a result of scattering cannot fit onto the provisioned instances according to their tool's resource requirements, those jobs will be queued for execution. As soon as enough resources become available following the completion of other jobs, queued jobs will be executed. This ensures there will be less idle time across the entire task. The CGC bioinformaticians exploit this technique when tuning workflows in Public Apps To set the maximum number of instances, set the [`sbg:maxNumberOfParallelInstances`](doc:list-of-execution-hints) hint at [workflow level](doc:set-execution-hints-at-workflow-level).