{"__v":2,"_id":"56d0ae447c2e100b000af980","category":{"__v":4,"_id":"56d0a2081ecc471500f1795e","pages":["56d0a2be376b040b005b304e","56d0a2e6376b040b005b3051","56d0ae447c2e100b000af980","56d0b36d40d36e1d00bc145a"],"project":"55faf11ba62ba1170021a9a7","version":"55faf11ba62ba1170021a9aa","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-02-26T19:05:44.797Z","from_sync":false,"order":8,"slug":"set-metadata-associated-with-a-private-file","title":"SET METADATA ASSOCIATED WITH A PRIVATE FILE"},"parentDoc":null,"project":"55faf11ba62ba1170021a9a7","user":"554290cd6592e60d00027d17","version":{"__v":37,"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-02-26T19:57:56.853Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":2,"body":"You can use the Command Line Uploader to set some or all of the metadata during upload. Or, you can [manually set metadata](set-metadata-using-the-visual-interface) later.\n\n##`.meta files`\nFor each file queued for upload, the Uploader looks for a supplementary file containing metadata to set for the file. This supplementary file should exist in the same directory as the file being uploaded, have an identical name to the original filename, and be appended by `.meta`. For example, if you are uploading `sample1.fastq`, the supplementary file should be named `sample1.fastq.meta`.\n\nThe supplementary file should contain a valid JSON object, as shown in the example below. Key-value pairs from this JSON object will be set on the server as metadata describing the uploaded file. If the supplementary .meta file contains invalid JSON or metadata values that fall outside of their acceptable range, a warning will be issued on the standard output, but the file upload will continue. Note that if you set invalid metadata values, the workflows you use with your files may not function correctly.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Supplementary files do not need to be included for upload in order for their metadata to be applied to the files being uploaded. Parsing and assigning metadata from supplementary files happens automatically as long as they are properly matched to their principal files via the naming convention described above.\"\n}\n[/block]\nThe following array of key-value pairs is an example of the metadata that could be contained in the metadata file `sample1.fastq.meta`:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n  \\\"sample_id\\\": \\\"sample1\\\",\\n  \\\"library_id\\\": \\\"library1\\\",\\n  \\\"paired_end\\\": \\\"1\\\",\\n  \\\"platform\\\": \\\"illumina HiSeq\\\",\\n  \\\"quality_scale\\\": \\\"illumina13\\\"\\n}\",\n      \"language\": \"text\",\n      \"name\": \"Bash\"\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"success\",\n  \"body\": \"Learn more about [metadata fields on the CGC](metadata-for-private-data).\"\n}\n[/block]\nApart from the standard set of metadata fields that can be seen through the visual interface, you are also able to add custom metadata for your files. Custom metadata fields are user-defined key-value pairs that allow you to provide additional metadata associated to files on the CGC. Custom metadata can be added via the command line uploader or via the API, but *not* through the visual interface.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Custom metadata fields will not be visible on the visual interface, but their values can be retrieved by [getting file details via the API](doc:get-file-details).\"\n}\n[/block]\nWhen adding custom metadata fields, you need to pay attention to the following set of rules:\n  * Keys and values are case sensitive unless explicitly treated differently by a tool or a part of the CGC.\n  * Maximum number of key-value pairs per file is 1000, including null-value keys.\n  * Keys and values are UTF-8 encoded strings.\n  * Maximum length of a key is 100 bytes (UTF-8 encoding).\n  * Maximum length of a value is 300 bytes (UTF-8 encoding).\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Set metadata for multiple files using a manifest file\n\nMetadata can be set for multiple files during the upload by supplying a manifest file that contains the metadata for a group of accompanying files.\n\n###Supported file formats\n\nThe supported file formats for the manifest file are:\n  * **CSV** - comma separated values\n  * **TSV** - tab separated values\n\nCSV and TSV files contain a number of rows with columns which are separated with either a comma (CSV) or a tab (TSV). The following rules apply for the manifest file:\n\n  * The lines are separated with a line break, while the columns are separated using either a comma (for CSV) or a tab (for TSV).\n  * The **first row** has to contain **column names** which are parsed as metadata fields (e.g. “sample”, “library”).\n  * The **first column** has to contain the **names of the files** which will be uploaded. In case the files are not in the same directory as the manifest file, you should also include a path to the files (e.g. ../filename.fastq).\n  * All subsequent columns should contain metadata fields which will be assigned to the specified files.\n  * Quotation marks are allowed.\n\nThe following example shows the content of the manifest for three files with three metadata fields.\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"File name\",\n    \"h-1\": \"sample\",\n    \"h-2\": \"library\",\n    \"h-3\": \"paired_end\",\n    \"2-0\": \"file3.fastq\",\n    \"0-0\": \"file1.fastq\",\n    \"0-1\": \"sample1\",\n    \"0-2\": \"examplelibrary1\",\n    \"0-3\": \"1\",\n    \"1-0\": \"file2.fastq\",\n    \"1-1\": \"sample1\",\n    \"1-2\": \"examplelibrary1\",\n    \"1-3\": \"2\",\n    \"2-1\": \"sample2\",\n    \"2-2\": \"examplelibrary2\",\n    \"2-3\": \"1\"\n  },\n  \"cols\": 4,\n  \"rows\": 3\n}\n[/block]\nBelow is the same example in a comma separated format.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"File name,sample,library,paired_end\\nfile1.fastq,sample1,examplelibrary1,1\\nfile2.fastq,sample1,examplelibrary1,2\\nfile3.fastq,sample2,examplelibrary2,1\",\n      \"language\": \"text\"\n    }\n  ]\n}\n[/block]\n\n###Upload files and set metadata\n\nTo upload multiple files and set their metadata using the manifest, issue the following command:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"cgc-uploader.sh --manifest-file filename.csv --manifest-metadata\",\n      \"language\": \"text\"\n    }\n  ]\n}\n[/block]\nThis will upload all files which are specified in the manifest (e.g. filename.csv) and apply relevant metadata for each of the files.\n\nThe `--manifest-file` option is used for specifying the name (and path) of the manifest file, while the `--manifest-metadata` option instructs the Command Line Uploader to also parse metadata values from the manifest.\n\n###Upload files and set individual metadata fields\n\nTo upload multiple files and set individual metadata fields, issue the following command:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"cgc-uploader.sh --manifest-file filename.csv --manifest-metadata sample paired_end\",\n      \"language\": \"text\"\n    }\n  ]\n}\n[/block]\nIn the example above the only two metadata fields which will be set for to uploaded files are `sample` and `paired_end`. The metadata fields are specified after the `--manifest-metadata` option.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"You can specify any number of metadata fields by listing them after the --manifest-metadata option.\"\n}\n[/block]\n###Upload files without setting metadata\n\nIn case you you are dealing with larger volumes of data, or if you want to automate the upload of a fixed list of files, you can use the manifest to upload multiple files without setting any metadata.\n\nIssue the following command:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"cgc-uploader.sh --manifest-file filename.csv\",\n      \"language\": \"text\"\n    }\n  ]\n}\n[/block]\n###Perform a dry run\n\nBefore performing an actual upload you can do a dry run. This will only output data in the terminal allowing you to check if all the settings are correct without uploading anything. To perform a dry run, issue the following command:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"cgc-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run\",\n      \"language\": \"text\"\n    }\n  ]\n}\n[/block]\nTo only output information about specific metadata fields, issue the following command:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"cgc-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run sample library\",\n      \"language\": \"text\"\n    }\n  ]\n}\n[/block]\nThe `sample` and `library` metadata fields are the only ones which will be outputted in the terminal.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"You can specify any number of individual metadata fields by listing them after the `--dry-run` option.\"\n}\n[/block]\n###General notes\n\nThe Command Line Uploader assumes that both the files which are being uploaded and the accompanying manifest file reside in the same directory. If that is not the case, you can specify the path:\n  * **within the manifest**, by prepending the file path to the file name.\n  * **in the command line** by specifying the full path to the manifest file.\n  \nIf a file you have specified in the manifest also has an accompanying .meta file, the contents of that `.meta` file will be applied in addition to what is parsed from the manifest, expanding and/or overriding any key-value pairs.\n\n\n<div align=\"right\"><a href=\"#top\">top</a></div>","excerpt":"","slug":"set-metadata-using-the-command-line-uploader","type":"basic","title":"Set metadata using the command line uploader"}

Set metadata using the command line uploader


You can use the Command Line Uploader to set some or all of the metadata during upload. Or, you can [manually set metadata](set-metadata-using-the-visual-interface) later. ##`.meta files` For each file queued for upload, the Uploader looks for a supplementary file containing metadata to set for the file. This supplementary file should exist in the same directory as the file being uploaded, have an identical name to the original filename, and be appended by `.meta`. For example, if you are uploading `sample1.fastq`, the supplementary file should be named `sample1.fastq.meta`. The supplementary file should contain a valid JSON object, as shown in the example below. Key-value pairs from this JSON object will be set on the server as metadata describing the uploaded file. If the supplementary .meta file contains invalid JSON or metadata values that fall outside of their acceptable range, a warning will be issued on the standard output, but the file upload will continue. Note that if you set invalid metadata values, the workflows you use with your files may not function correctly. [block:callout] { "type": "info", "body": "Supplementary files do not need to be included for upload in order for their metadata to be applied to the files being uploaded. Parsing and assigning metadata from supplementary files happens automatically as long as they are properly matched to their principal files via the naming convention described above." } [/block] The following array of key-value pairs is an example of the metadata that could be contained in the metadata file `sample1.fastq.meta`: [block:code] { "codes": [ { "code": "{\n \"sample_id\": \"sample1\",\n \"library_id\": \"library1\",\n \"paired_end\": \"1\",\n \"platform\": \"illumina HiSeq\",\n \"quality_scale\": \"illumina13\"\n}", "language": "text", "name": "Bash" } ] } [/block] [block:callout] { "type": "success", "body": "Learn more about [metadata fields on the CGC](metadata-for-private-data)." } [/block] Apart from the standard set of metadata fields that can be seen through the visual interface, you are also able to add custom metadata for your files. Custom metadata fields are user-defined key-value pairs that allow you to provide additional metadata associated to files on the CGC. Custom metadata can be added via the command line uploader or via the API, but *not* through the visual interface. [block:callout] { "type": "info", "body": "Custom metadata fields will not be visible on the visual interface, but their values can be retrieved by [getting file details via the API](doc:get-file-details)." } [/block] When adding custom metadata fields, you need to pay attention to the following set of rules: * Keys and values are case sensitive unless explicitly treated differently by a tool or a part of the CGC. * Maximum number of key-value pairs per file is 1000, including null-value keys. * Keys and values are UTF-8 encoded strings. * Maximum length of a key is 100 bytes (UTF-8 encoding). * Maximum length of a value is 300 bytes (UTF-8 encoding). <div align="right"><a href="#top">top</a></div> ##Set metadata for multiple files using a manifest file Metadata can be set for multiple files during the upload by supplying a manifest file that contains the metadata for a group of accompanying files. ###Supported file formats The supported file formats for the manifest file are: * **CSV** - comma separated values * **TSV** - tab separated values CSV and TSV files contain a number of rows with columns which are separated with either a comma (CSV) or a tab (TSV). The following rules apply for the manifest file: * The lines are separated with a line break, while the columns are separated using either a comma (for CSV) or a tab (for TSV). * The **first row** has to contain **column names** which are parsed as metadata fields (e.g. “sample”, “library”). * The **first column** has to contain the **names of the files** which will be uploaded. In case the files are not in the same directory as the manifest file, you should also include a path to the files (e.g. ../filename.fastq). * All subsequent columns should contain metadata fields which will be assigned to the specified files. * Quotation marks are allowed. The following example shows the content of the manifest for three files with three metadata fields. [block:parameters] { "data": { "h-0": "File name", "h-1": "sample", "h-2": "library", "h-3": "paired_end", "2-0": "file3.fastq", "0-0": "file1.fastq", "0-1": "sample1", "0-2": "examplelibrary1", "0-3": "1", "1-0": "file2.fastq", "1-1": "sample1", "1-2": "examplelibrary1", "1-3": "2", "2-1": "sample2", "2-2": "examplelibrary2", "2-3": "1" }, "cols": 4, "rows": 3 } [/block] Below is the same example in a comma separated format. [block:code] { "codes": [ { "code": "File name,sample,library,paired_end\nfile1.fastq,sample1,examplelibrary1,1\nfile2.fastq,sample1,examplelibrary1,2\nfile3.fastq,sample2,examplelibrary2,1", "language": "text" } ] } [/block] ###Upload files and set metadata To upload multiple files and set their metadata using the manifest, issue the following command: [block:code] { "codes": [ { "code": "cgc-uploader.sh --manifest-file filename.csv --manifest-metadata", "language": "text" } ] } [/block] This will upload all files which are specified in the manifest (e.g. filename.csv) and apply relevant metadata for each of the files. The `--manifest-file` option is used for specifying the name (and path) of the manifest file, while the `--manifest-metadata` option instructs the Command Line Uploader to also parse metadata values from the manifest. ###Upload files and set individual metadata fields To upload multiple files and set individual metadata fields, issue the following command: [block:code] { "codes": [ { "code": "cgc-uploader.sh --manifest-file filename.csv --manifest-metadata sample paired_end", "language": "text" } ] } [/block] In the example above the only two metadata fields which will be set for to uploaded files are `sample` and `paired_end`. The metadata fields are specified after the `--manifest-metadata` option. [block:callout] { "type": "info", "body": "You can specify any number of metadata fields by listing them after the --manifest-metadata option." } [/block] ###Upload files without setting metadata In case you you are dealing with larger volumes of data, or if you want to automate the upload of a fixed list of files, you can use the manifest to upload multiple files without setting any metadata. Issue the following command: [block:code] { "codes": [ { "code": "cgc-uploader.sh --manifest-file filename.csv", "language": "text" } ] } [/block] ###Perform a dry run Before performing an actual upload you can do a dry run. This will only output data in the terminal allowing you to check if all the settings are correct without uploading anything. To perform a dry run, issue the following command: [block:code] { "codes": [ { "code": "cgc-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run", "language": "text" } ] } [/block] To only output information about specific metadata fields, issue the following command: [block:code] { "codes": [ { "code": "cgc-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run sample library", "language": "text" } ] } [/block] The `sample` and `library` metadata fields are the only ones which will be outputted in the terminal. [block:callout] { "type": "info", "body": "You can specify any number of individual metadata fields by listing them after the `--dry-run` option." } [/block] ###General notes The Command Line Uploader assumes that both the files which are being uploaded and the accompanying manifest file reside in the same directory. If that is not the case, you can specify the path: * **within the manifest**, by prepending the file path to the file name. * **in the command line** by specifying the full path to the manifest file. If a file you have specified in the manifest also has an accompanying .meta file, the contents of that `.meta` file will be applied in addition to what is parsed from the manifest, expanding and/or overriding any key-value pairs. <div align="right"><a href="#top">top</a></div>