Metadata schema
These are subdivided into three categories (File, Sample, and General). The recommended practice is to enter as much metadata as possible when you first upload files to the Platform. For instance, for raw sequencing files, you should enter Platform (sequencing platform) and Sample ID. Of these fields, there are seven metadata fields that we highly suggest you set for your data. While your tasks may run correctly without them, these metadata fields will help optimize your analyses.
Please keep in mind the fields have to be specified exactly as listed in the tables below under the Name column. This means that if the field is not listed exactly as in the table, the Platform will interpret it is a custom metadata field (see below).
File
In the following table, you will find the name, description, and values of metadata fields for File. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
Name | API key | Description | Values |
---|---|---|---|
Library ID | library_id | This is an identifier for the sequencing library preparation. | This takes a string. |
Platform | platform | This is the version (manufacturer, model, etc.) of the technology that was used sequencing or assaying. | This takes a string. Suggested values: Affymetrix SNP Array 6.0 Illumina HiSeq Illumina Human Methylation 450 Illumina GA MDA_RPPA_Core BCR Record Hospital Record llumina Human Methylation 27 ABI capillary sequencer AgilentG4502A_07_3 HG-CGH-244A HG-CGH-415K_G4124A CGH-1x1M_G4447A Illumina MiSeq HT_HG-U133A Illumina Human 1M Duo H-miRNA_8x15Kv2 Illumina HumanHap550 H-miRNA_8x15K AgilentG4502A_07_2 HuEx-1_0-st-v2 ABI SOLiD Complete Genomics HG-U133_Plus_2 Illumina DNA Methylation OMA003 CPI Illumina DNA Methylation OMA002 CPI AgilentG4502A_07_1 Ion Torrent PGM Affymetrix U133 Plus 2 LS 454 HiSeq X Ten Mixed platforms Illumina Helicos PacBio Not available |
Platform unit ID | platform_unit_id | This is an identifier for lanes (Illumina), or for slides (SOLiD) in the case that a library was split and ran over multiple lanes on the flow cell or slides. The platform unit ID refers to the lane ID or the slide ID. | This takes a string. |
Paired end | paired_end | For paired-end sequencing, this value determines the end of the fragment sequenced. | This takes a value of '1' or '2'. Please keep in mind that the data type of these values is string. Note: For single-end sequencing, the field should be left as '-'. |
File segment number | file_segment_number | If the sequencing reads for a single library, sample and lane are divided into multiple (smaller) files, the File segment number is used to enumerate these. Otherwise, this field can be left blank. | This takes an integer. |
Quality scale | quality_scale | For raw reads, this value denotes the sequencing technology and quality format. For BAM and SAM files, this value should always be ‘Sanger’. | Choose from one of the following options: sanger llumina13 illumina15 illumina18 * solexa Or, enter no value. |
Experimental strategy | experimental_strategy | This is the method or protocol used to perform the laboratory analysis. | This takes a string. Suggested values: DNA-Seq WXS WGS Amplicon Bisulfite-Seq VALIDATION RNA-Seq miRNA-Seq Total RNA-Seq Genotyping Array Exon Array CGH Array Methylation Array Gene Expression Array miRNA Array Protein Expression Array MSI- Mono- Dinucleotide Array Not available |
Reference genome | reference_genome | The reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned. | This takes a string. Suggested values: NCBI36_BCCAGSC_variant NCBI36_BCM_variant NCBI36_WUGSC_variant HG18 HG18_Broad_variant GRCh37 GRCh37-lite GRCh37_BI_Variant GRCh37-lite-+-HPV_Redux-build GRCh37-lite_WUGSC_variant_1 GRCh37-lite_WUGSC_variant_2 HG19 HG19_Broad_variant HS37D5 * GRCh38 |
Sample
In the following table, you will find the name, description, and values of metadata fields for Sample. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
Name | API key | Description | Value |
---|---|---|---|
Sample ID | sample_id | A human readable identifier for a sample or specimen, which could contain some metadata information. A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc. | This takes a string. |
Sample type | sample_type | The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. | This takes a string. Suggested values: Blood Derived Normal Buccal Cell Normal Primary Blood Derived Cancer - Peripheral Blood Recurrent Blood Derived Cancer - Peripheral Blood Primary Tumor Recurrent Blood Derived Cancer - Bone Marrow Recurrent Tumor Solid Tissue Normal Metastatic Additional - New Primary Additional Metastatic Human Tumor Original Cells Primary Blood Derived Cancer - Bone Marrow Cell Lines Xenograft Tissue Bone Marrow Normal Fibroblasts from Bone Marrow Normal Not available |
Sample UUID | sample_uuid | A unique identifier for the sample or specimen used in the investigation, such as a Universally Unique Identifier (UUID). A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc. | This takes a string. |
Aliquot
In the following table, you will find the name, description, and values of metadata fields for Aliquot. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
Name | API key | Description | Value |
---|---|---|---|
Aliquot ID* | aliquot_id | A human readable identifier for an aliquot, which may contain metadata information. The aliquot is a product or unit extracted from a sample of a specimen and prepared for the analysis. | This takes a string. |
Aliquot UUID | aliquot_uuid | The unique identifier for an aliquot, such as a Universally Unique Identifier (UUID). The aliquot is a product or unit extracted from a sample of a specimen and prepared for the analysis. | This takes a string. |
Case
In the following table, you will find the name, description, and values of metadata fields for Case. The Case category is further subdivided by the following properties: Diagnosis, Demographic, Status, and Prognosis. These properties are included in italics below the metadata field's name in the first column. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
Name | API key | Description | Value |
---|---|---|---|
Case ID | case_id | An identifier, such as a number or a string that may contain metadata information, for a subject who has taken part in the investigation of study. | This takes a string. |
Case UUID | case_uuid | An unique identifier, such as a Universally Unique Identifier (UUID), for a subject who has taken part in the investigation of study. | This takes a string. |
Primary site (Diagnosis) | primary_site | The anatomical site where the primary tumor is located in the organism. | This takes a string. Suggested values: Adrenal Gland Bile Duct Bladder Blood Brain Breast Cervix Colorectal Esophagus Eye Head And Neck Liver Lung Lymph Nodes Kidney Mesenchymal Mesothelium Nervous System Ovary Pancreas Prostate Skin Stomach Uterus Testis Thymus Thyroid Not available |
Disease type (Diagnosis) | disease_type | The type of the disease or condition studied. | This takes a string. Suggested values: Acute Myeloid Leukemia Adrenocortical Carcinoma Bladder Urothelial Carcinoma Brain Lower Grade Glioma Breast Invasive Carcinoma Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma Cholangiocarcinoma Chronic Myelogenous Leukemia Colon Adenocarcinoma Esophageal Carcinoma Glioblastoma Multiforme Head and Neck Squamous Cell Carcinoma Kidney Chromophobe Kidney Renal Clear Cell Carcinoma Kidney Renal Papillary Cell Carcinoma Liver Hepatocellular Carcinoma Lung Adenocarcinoma Lung Squamous Cell Carcinoma Lymphoid Neoplasm Diffuse Large B-cell Lymphoma Mesothelioma Ovarian Serous Cystadenocarcinoma Pancreatic Adenocarcinoma Pheochromocytoma and Paraganglioma Prostate Adenocarcinoma Rectum Adenocarcinoma Sarcoma Skin Cutaneous Melanoma Stomach Adenocarcinoma Testicular Germ Cell Tumors Thymoma Thyroid Carcinoma Uterine Carcinosarcoma Uterine Corpus Endometrial Carcinoma Uveal Melanoma * Not available |
Gender (Demographic) | gender | The collection of behaviors and attitudes that distinguish people on the basis of societal roles expected for the two sexes. | Choose from the following: Female Male |
Age at diagnosis (Diagnosis ) | age_at_diagnosis | The age in years of the case at the initial pathological diagnosis of disease or cancer. | This takes a non-negative integer. |
Vital status (Status) | vital_status | The state of being living or deceased for cases that are part of the investigation. | Choose from the following: Alive Dead Lost to follow-up Unknown * Not available |
Days to death (Prognosis) | days_to_death | A value denoting the project or study that generated the data. | This takes a non-negative integer. |
Race (Demographic) | race | The number of days from the date of the initial pathological diagnosis to the date of death for the case in investigation. | This takes a string. Suggested values: White American Indian or Alaska Native Black or African American Asian Native Hawaiian or other Pacific Islander Not reported * Not available |
Ethnicity (Demographic) | ethnicity | A socially defined category of people based on common ancestral, cultural, biological, and social factors. | This takes a string. Suggested values: Hispanic or Latino Not Hispanic or Latino Not reported Not Available |
General
In the following table, you will find the name, description, and values of metadata fields for General. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
Name | API key | Description | Value |
---|---|---|---|
Investigation | Investigation | A value denoting the project or study that generated the data. | This takes a string. |
Updated over 1 year ago