{"_id":"5a3bf0e3849bad001c77c1cd","project":"55faf11ba62ba1170021a9a7","version":{"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","__v":40,"createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77","59a555bccdbd85001bfb1442","5a2a81f688574d001e9934f5","5b080c8d7833b20003ddbb6f"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"55faf96917b9d00d00969f48","pages":["5626a5d644c87f0d00fe6396","5626a5e4e2ce610d004e3dd8","5626a89ce2ce610d004e3dde","56429a87f49bfa0d002f54e0"],"project":"55faf11ba62ba1170021a9a7","__v":4,"version":"55faf11ba62ba1170021a9aa","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-09-17T17:33:29.016Z","from_sync":false,"order":20,"slug":"run-an-analysis","title":"RUN AN ANALYSIS"},"user":"5613e4f8fdd08f2b00437620","githubsync":"","__v":0,"parentDoc":null,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2017-12-21T17:35:31.434Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":5,"body":"##Overview\n\nBatching runs the same workflow or tool multiple times with varying inputs in parallel executions. These inputs can be grouped by input files or via specified metadata criteria.\n\nLearn more about [setting up a batch task](doc:run-a-task#section--2-optional-designate-your-task-as-a-batch-task). On this page, learn more about batching.\n\n##What are batch tasks?\n\n**Batch analysis** separates files into batches or groups when running your analysis. The batching is done according to the specified metadata criteria of your input files or according to a files list you provide.\n\nWhen your batch task is run, it produces multiple child tasks: one for each group of files. The child tasks perform the same analysis independently.\n\nRunning a batch task will produce a number of child tasks equal to the number of groups your inputs have been separated into. There are a series of advantages to running a batch task:\n\n  * The batch task automatically generates identical tasks to be run on your groups of files, saving you the effort of having to do this manually.\n  * Child tasks are executed in parallel and are independent from each other.\n  * If a child task fails, this does not affect the other child tasks, meaning that you only need to troubleshoot and rerun the failed task. \n  * You can re-run any child task separately.\n\n##What is metadata?\n\n**Metadata** is information about the nature of your data and how it was obtained. This information is used by apps on the CGC to make sure the right data gets analyzed together. Learn more about supplying [metadata for your files](doc:metadata-for-your-files). \n\nBatch tasks use values for metadata fields or file names in a given file list to group the files you are analyzing.\n\nSome common scenarios include batching by:\n\n  * File name\n  * Case ID\n  * Sample ID\n  * Library ID\n  * Platform ID\n  * File segment\n\n##How do I run a batch task?\n\nYou can set up a batch analysis on the CGC both using the [visual interface](doc:run-a-task#section--2-optional-designate-your-task-as-a-batch-task), as well as [via the API](doc:api-batch-tutorial).\n\n##Example of batching by metadata\n\nLet's imagine having to run the **RNA-seq Alignment - STAR** workflow on many files.\n\nThe metadata fields by which batching is possible via the visual interface are described in the figure below.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/7a8cd60-batching_nested_hierarchy01.png\",\n        \"batching_nested_hierarchy01.png\",\n        1564,\n        622,\n        \"#cce3f3\"\n      ]\n    }\n  ]\n}\n[/block]\n##Grouping input files into batches\n\nGroup input files by their metadata or by files or run the workflow as a single task, which will group all inputs together. The optimal grouping depends on your experimental design.\n\nFor example, suppose you want to run the public Whole Genome Analysis workflow, and you have multiple FASTQ files from many samples (two paired end reads per sample, resulting in two files per sample). In this case, you might want to analyze files from each sample in batches.\n\n###Batching by File\n\n\nTo batch by file, select **File** from the Batch by drop-down menu. Batching by File runs the workflow or tool for each individual file, initiating a new child task for each input file.\n\n###Batch by file metadata\n\nTo batch by file metadata, select **File metadata** from the Batch by drop-down menu. Specify the metadata field by which to batch. Files are grouped by their value for a metadata field. A separate child task will be created for each different grouping.","excerpt":"","slug":"about-batch-analyses","type":"basic","title":"About batch analyses"}

About batch analyses


##Overview Batching runs the same workflow or tool multiple times with varying inputs in parallel executions. These inputs can be grouped by input files or via specified metadata criteria. Learn more about [setting up a batch task](doc:run-a-task#section--2-optional-designate-your-task-as-a-batch-task). On this page, learn more about batching. ##What are batch tasks? **Batch analysis** separates files into batches or groups when running your analysis. The batching is done according to the specified metadata criteria of your input files or according to a files list you provide. When your batch task is run, it produces multiple child tasks: one for each group of files. The child tasks perform the same analysis independently. Running a batch task will produce a number of child tasks equal to the number of groups your inputs have been separated into. There are a series of advantages to running a batch task: * The batch task automatically generates identical tasks to be run on your groups of files, saving you the effort of having to do this manually. * Child tasks are executed in parallel and are independent from each other. * If a child task fails, this does not affect the other child tasks, meaning that you only need to troubleshoot and rerun the failed task. * You can re-run any child task separately. ##What is metadata? **Metadata** is information about the nature of your data and how it was obtained. This information is used by apps on the CGC to make sure the right data gets analyzed together. Learn more about supplying [metadata for your files](doc:metadata-for-your-files). Batch tasks use values for metadata fields or file names in a given file list to group the files you are analyzing. Some common scenarios include batching by: * File name * Case ID * Sample ID * Library ID * Platform ID * File segment ##How do I run a batch task? You can set up a batch analysis on the CGC both using the [visual interface](doc:run-a-task#section--2-optional-designate-your-task-as-a-batch-task), as well as [via the API](doc:api-batch-tutorial). ##Example of batching by metadata Let's imagine having to run the **RNA-seq Alignment - STAR** workflow on many files. The metadata fields by which batching is possible via the visual interface are described in the figure below. [block:image] { "images": [ { "image": [ "https://files.readme.io/7a8cd60-batching_nested_hierarchy01.png", "batching_nested_hierarchy01.png", 1564, 622, "#cce3f3" ] } ] } [/block] ##Grouping input files into batches Group input files by their metadata or by files or run the workflow as a single task, which will group all inputs together. The optimal grouping depends on your experimental design. For example, suppose you want to run the public Whole Genome Analysis workflow, and you have multiple FASTQ files from many samples (two paired end reads per sample, resulting in two files per sample). In this case, you might want to analyze files from each sample in batches. ###Batching by File To batch by file, select **File** from the Batch by drop-down menu. Batching by File runs the workflow or tool for each individual file, initiating a new child task for each input file. ###Batch by file metadata To batch by file metadata, select **File metadata** from the Batch by drop-down menu. Specify the metadata field by which to batch. Files are grouped by their value for a metadata field. A separate child task will be created for each different grouping.