{"_id":"586d3c3d6481370f00f6f305","project":"55faf11ba62ba1170021a9a7","parentDoc":null,"version":{"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","__v":37,"createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"586d3c287c6b5b2300c05055","version":"55faf11ba62ba1170021a9aa","__v":0,"project":"55faf11ba62ba1170021a9a7","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2017-01-04T18:17:12.683Z","from_sync":false,"order":19,"slug":"task-execution","title":"TASK EXECUTION"},"user":"575e85ac41c8ba0e00259a44","__v":0,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2017-01-04T18:17:33.547Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"Upon execution, your task gets broken down into modular units called [jobs](doc:about-task-execution#section-jobs). Each job is then executed when its inputs become available. Jobs are carried out on compute [instances](doc:about-task-execution#section-instances), a [process](doc:about-task-execution#section-scheduling) orchestrated by the Scheduling Algorithm. You can [control task execution](doc:about-task-execution#section-controlling-execution) by setting certain parameters that the Scheduling Algorithm bases its decisions on.\n\n##Jobs\n*Jobs* are the various steps of your task execution and include tool executions, file uploading/downloading, and retrieval of Docker images.\n\nA job is created for every tool execution. Most of the time a single tool will execute only once per workflow, thus giving a single job per task.\n\nIn some cases, however, a tool may be run multiple times to process individual parts of a large input more efficiently. This yields multiple jobs per tool. A tool may also be executed multiple times if one of its inputs is a list whose elements are to be processed in parallel. \n\n##Instances\nWhen you run an analysis, it gets executed on a computational instance from the cloud infrastructure provider (Amazon Web Services or Google Cloud Platform).\n\nThe computation instances appear as remote computers capable of executing generic software and are referred to as *instances*.\n\nAs with any computer, an instance will have a number of CPU cores, memory, hard disk, and network resources.\n\nThe CGC provides the infrastructure that controls instances throughout the execution of your analysis.\n\n## Queueing\n\nThere are several cases in which a task can be temporarily queued:\n1. The task has been just submitted and is awaiting execution.\n2. The maximum number of parallel instances for your account has been reached.\n3. Some of the cloud infrastructure resources required for task execution are not available.\n\nIn cases 2 and 3 above, task status will change back from **QUEUED** to **RUNNING** when the required parallel instances or cloud infrastructure resources become available. This change of task status can happen several times during execution.\n\nWhen a task is queued due to reaching the maximum allowed number of parallel instances per user account, the time required for the task to change its state from **QUEUED** back to **RUNNING** can depend on several factors such as:\n* size of input files,\n* time it takes for the tool or workflow to execute,\n* availability of instances - e.g. whether the required instance type is available immediately.\n\nTo ensure that all users can run their tasks on the CGC, each individual user has a limit of 80 parallel instances. This instance limit is implemented because the number of parallel instances used in total by the CGC is limited by Amazon Web Services (AWS), the CGC’s underlying cloud service provider. Even though this means that tasks requiring more than 80 instances might take longer to complete, it ensures that instances are available for all CGC users to run their tasks.\n \nThe limit is applied as the cumulative maximum number of parallel instances per user, for all tasks in all projects created by the user. To understand how the limit works, please consider the following example:\n1. User **rfranklin** has two projects on the CGC, named **WGS** and **WES**.\n2. In **WES**, rfranklin is currently running a batch task that is using 56 parallel instances.\n3. In **WGS**, **rfranklin** starts another batch task that requires 42 parallel instances. As the limit of 80 parallel instances is applied per user, this means that the task in **WGS** will be able to use only 24 instances (80 minus the 56 used by the task in **WES**), while the remaining instances are allocated as either of the two running tasks releases them.\n \nUsers who are added to a project also run their tasks within the project creator’s parallel instance limit. In the example above, if **rfranklin** finally adds user **jsmith** to one of the projects, **WES** or **WGS**, and **jsmith** tries to run a task, this task will be queued as **rfranklin** is already using the maximum allowed number of parallel instances.\n \nFor more general information about different task states, please refer to the [list of task statuses](doc:list-of-task-statuses). \n\n##Scheduling\nThe process of assigning and launching instances that fit the tool executions in a task is called *scheduling*.\n\nThe orchestration of job execution as well as instance provision is carried out by the our bespoke Scheduling Algorithm.\n\nWhen a tool is about to be executed, the Scheduling Algorithm picks the best instance based on the [tool's resource requirements](doc:about-tool-resource-requirements). The Scheduling Algorithm may launch that job on an active instance, side-by-side with an earlier job from the same task, or it may decide to launch a new instance to host the new job. \n\n##Controlling execution\nYou can control the execution of your task by customizing parameters the Scheduling Algorithm works with. \n\nThe parameters you can tune are the types of instance(s) used for your analysis, as well as how many of each instance type can be used in parallel.\n\nThese parameters can be attached to tools and workflows that the Scheduling Algorithm will respect when deciding which tool runs where. Those parameters are called [execution hints](doc:list-of-execution-hints).\n\nThe CGC understands and implements tools, workflows, jobs and execution as prescribed by the Common Workflow Language (CWL).","excerpt":"","slug":"about-task-execution","type":"basic","title":"About task execution"}

About task execution


Upon execution, your task gets broken down into modular units called [jobs](doc:about-task-execution#section-jobs). Each job is then executed when its inputs become available. Jobs are carried out on compute [instances](doc:about-task-execution#section-instances), a [process](doc:about-task-execution#section-scheduling) orchestrated by the Scheduling Algorithm. You can [control task execution](doc:about-task-execution#section-controlling-execution) by setting certain parameters that the Scheduling Algorithm bases its decisions on. ##Jobs *Jobs* are the various steps of your task execution and include tool executions, file uploading/downloading, and retrieval of Docker images. A job is created for every tool execution. Most of the time a single tool will execute only once per workflow, thus giving a single job per task. In some cases, however, a tool may be run multiple times to process individual parts of a large input more efficiently. This yields multiple jobs per tool. A tool may also be executed multiple times if one of its inputs is a list whose elements are to be processed in parallel. ##Instances When you run an analysis, it gets executed on a computational instance from the cloud infrastructure provider (Amazon Web Services or Google Cloud Platform). The computation instances appear as remote computers capable of executing generic software and are referred to as *instances*. As with any computer, an instance will have a number of CPU cores, memory, hard disk, and network resources. The CGC provides the infrastructure that controls instances throughout the execution of your analysis. ## Queueing There are several cases in which a task can be temporarily queued: 1. The task has been just submitted and is awaiting execution. 2. The maximum number of parallel instances for your account has been reached. 3. Some of the cloud infrastructure resources required for task execution are not available. In cases 2 and 3 above, task status will change back from **QUEUED** to **RUNNING** when the required parallel instances or cloud infrastructure resources become available. This change of task status can happen several times during execution. When a task is queued due to reaching the maximum allowed number of parallel instances per user account, the time required for the task to change its state from **QUEUED** back to **RUNNING** can depend on several factors such as: * size of input files, * time it takes for the tool or workflow to execute, * availability of instances - e.g. whether the required instance type is available immediately. To ensure that all users can run their tasks on the CGC, each individual user has a limit of 80 parallel instances. This instance limit is implemented because the number of parallel instances used in total by the CGC is limited by Amazon Web Services (AWS), the CGC’s underlying cloud service provider. Even though this means that tasks requiring more than 80 instances might take longer to complete, it ensures that instances are available for all CGC users to run their tasks. The limit is applied as the cumulative maximum number of parallel instances per user, for all tasks in all projects created by the user. To understand how the limit works, please consider the following example: 1. User **rfranklin** has two projects on the CGC, named **WGS** and **WES**. 2. In **WES**, rfranklin is currently running a batch task that is using 56 parallel instances. 3. In **WGS**, **rfranklin** starts another batch task that requires 42 parallel instances. As the limit of 80 parallel instances is applied per user, this means that the task in **WGS** will be able to use only 24 instances (80 minus the 56 used by the task in **WES**), while the remaining instances are allocated as either of the two running tasks releases them. Users who are added to a project also run their tasks within the project creator’s parallel instance limit. In the example above, if **rfranklin** finally adds user **jsmith** to one of the projects, **WES** or **WGS**, and **jsmith** tries to run a task, this task will be queued as **rfranklin** is already using the maximum allowed number of parallel instances. For more general information about different task states, please refer to the [list of task statuses](doc:list-of-task-statuses). ##Scheduling The process of assigning and launching instances that fit the tool executions in a task is called *scheduling*. The orchestration of job execution as well as instance provision is carried out by the our bespoke Scheduling Algorithm. When a tool is about to be executed, the Scheduling Algorithm picks the best instance based on the [tool's resource requirements](doc:about-tool-resource-requirements). The Scheduling Algorithm may launch that job on an active instance, side-by-side with an earlier job from the same task, or it may decide to launch a new instance to host the new job. ##Controlling execution You can control the execution of your task by customizing parameters the Scheduling Algorithm works with. The parameters you can tune are the types of instance(s) used for your analysis, as well as how many of each instance type can be used in parallel. These parameters can be attached to tools and workflows that the Scheduling Algorithm will respect when deciding which tool runs where. Those parameters are called [execution hints](doc:list-of-execution-hints). The CGC understands and implements tools, workflows, jobs and execution as prescribed by the Common Workflow Language (CWL).