Edit metadata using the command line uploader
You can use the Command Line Uploader to set some or all of the metadata during upload. Or, you can manually set metadata later.
.meta files
.meta files
For each file queued for upload, the Uploader looks for a supplementary file containing metadata to set for the file. This supplementary file should exist in the same directory as the file being uploaded, have an identical name to the original filename, and be appended by .meta
. For example, if you are uploading sample1.fastq
, the supplementary file should be named sample1.fastq.meta
.
The supplementary file should contain a valid JSON object, as shown in the example below. Key-value pairs from this JSON object will be set on the server as metadata describing the uploaded file. If the supplementary .meta file contains invalid JSON or metadata values that fall outside of their acceptable range, a warning will be issued on the standard output, but the file upload will continue. Note that if you set invalid metadata values, the workflows you use with your files may not function correctly.
Supplementary files do not need to be included for upload in order for their metadata to be applied to the files being uploaded. Parsing and assigning metadata from supplementary files happens automatically as long as they are properly matched to their principal files via the naming convention described above.
The following array of key-value pairs is an example of the metadata that could be contained in the metadata file sample1.fastq.meta
:
{
"sample_id": "sample1",
"library_id": "library1",
"paired_end": "1",
"platform": "illumina HiSeq",
"quality_scale": "illumina13"
}
Learn more about metadata fields on the CGC.
Apart from the standard set of metadata fields that can be seen through the visual interface, you are also able to add custom metadata for your files. Custom metadata fields are user-defined key-value pairs that allow you to provide additional metadata associated to files on the CGC. Custom metadata can be added via the command line uploader or via the API, but not through the visual interface.
Custom metadata fields will not be visible on the visual interface, but their values can be retrieved by getting file details via the API.
When adding custom metadata fields, you need to pay attention to the following set of rules:
- Keys and values are case sensitive unless explicitly treated differently by a tool or a part of the CGC.
- Maximum number of key-value pairs per file is 1000, including null-value keys.
- Keys and values are UTF-8 encoded strings.
- Maximum length of a key is 100 bytes (UTF-8 encoding).
- Maximum length of a value is 300 bytes (UTF-8 encoding).
Set metadata for multiple files using a manifest file
Metadata can be set for multiple files during the upload by supplying a manifest file that contains the metadata for a group of accompanying files.
Supported file formats
The supported file formats for the manifest file are:
- CSV - comma separated values
- TSV - tab separated values
CSV and TSV files contain a number of rows with columns which are separated with either a comma (CSV) or a tab (TSV). The following rules apply for the manifest file:
- The lines are separated with a line break, while the columns are separated using either a comma (for CSV) or a tab (for TSV).
- The first row has to contain column names which are parsed as metadata fields (e.g. “sample”, “library”).
- The first column has to contain the names of the files which will be uploaded. In case the files are not in the same directory as the manifest file, you should also include a path to the files (e.g. ../filename.fastq).
- All subsequent columns should contain metadata fields which will be assigned to the specified files.
- Quotation marks are allowed.
The following example shows the content of the manifest for three files with three metadata fields.
File name | sample | library | paired_end |
---|---|---|---|
file1.fastq | sample1 | examplelibrary1 | 1 |
file2.fastq | sample1 | examplelibrary1 | 2 |
file3.fastq | sample2 | examplelibrary2 | 1 |
Below is the same example in a comma separated format.
File name,sample,library,paired_end
file1.fastq,sample1,examplelibrary1,1
file2.fastq,sample1,examplelibrary1,2
file3.fastq,sample2,examplelibrary2,1
Upload files and set metadata
To upload multiple files and set their metadata using the manifest, issue the following command:
cgc-uploader.sh --manifest-file filename.csv --manifest-metadata
This will upload all files which are specified in the manifest (e.g. filename.csv) and apply relevant metadata for each of the files.
The --manifest-file
option is used for specifying the name (and path) of the manifest file, while the --manifest-metadata
option instructs the Command Line Uploader to also parse metadata values from the manifest.
Upload files and set individual metadata fields
To upload multiple files and set individual metadata fields, issue the following command:
cgc-uploader.sh --manifest-file filename.csv --manifest-metadata sample paired_end
In the example above the only two metadata fields which will be set for to uploaded files are sample
and paired_end
. The metadata fields are specified after the --manifest-metadata
option.
You can specify any number of metadata fields by listing them after the --manifest-metadata option.
Upload files without setting metadata
In case you you are dealing with larger volumes of data, or if you want to automate the upload of a fixed list of files, you can use the manifest to upload multiple files without setting any metadata.
Issue the following command:
cgc-uploader.sh --manifest-file filename.csv
Perform a dry run
Before performing an actual upload you can do a dry run. This will only output data in the terminal allowing you to check if all the settings are correct without uploading anything. To perform a dry run, issue the following command:
cgc-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run
To only output information about specific metadata fields, issue the following command:
cgc-uploader.sh --manifest-file manifest.csv --manifest-metadata --dry-run sample library
The sample
and library
metadata fields are the only ones which will be outputted in the terminal.
You can specify any number of individual metadata fields by listing them after the
--dry-run
option.
General notes
The Command Line Uploader assumes that both the files which are being uploaded and the accompanying manifest file reside in the same directory. If that is not the case, you can specify the path:
- within the manifest, by prepending the file path to the file name.
- in the command line by specifying the full path to the manifest file.
If a file you have specified in the manifest also has an accompanying .meta file, the contents of that .meta
file will be applied in addition to what is parsed from the manifest, expanding and/or overriding any key-value pairs.
Updated less than a minute ago