Metadata schema

These are subdivided into three categories (File, Sample, and General). The recommended practice is to enter as much metadata as possible when you first upload files to the Platform. For instance, for raw sequencing files, you should enter Platform (sequencing platform) and Sample ID. Of these fields, there are seven metadata fields that we highly suggest you set for your data. While your tasks may run correctly without them, these metadata fields will help optimize your analyses.

Please keep in mind the fields have to be specified exactly as listed in the tables below under the Name column. This means that if the field is not listed exactly as in the table, the Platform will interpret it is a custom metadata field (see below).

File

In the following table, you will find the name, description, and values of metadata fields for File. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.

NameAPI keyDescriptionValues
Library IDlibrary_idThis is an identifier for the sequencing library preparation.This takes a string.
PlatformplatformThis is the version (manufacturer, model, etc.) of the technology that was used sequencing or assaying.This takes a string.
Suggested values:
Affymetrix SNP Array 6.0
Illumina HiSeq
Illumina Human Methylation 450
Illumina GA
MDA_RPPA_Core
BCR Record
Hospital Record
llumina Human Methylation 27
ABI capillary sequencer
AgilentG4502A_07_3
HG-CGH-244A
HG-CGH-415K_G4124A
CGH-1x1M_G4447A
Illumina MiSeq
HT_HG-U133A
Illumina Human 1M Duo
H-miRNA_8x15Kv2
Illumina HumanHap550
H-miRNA_8x15K
AgilentG4502A_07_2
HuEx-1_0-st-v2
ABI SOLiD
Complete Genomics
HG-U133_Plus_2
Illumina DNA Methylation OMA003 CPI
Illumina DNA Methylation OMA002 CPI
AgilentG4502A_07_1
Ion Torrent PGM
Affymetrix U133 Plus 2
LS 454
HiSeq X Ten
Mixed platforms
Illumina
Helicos
PacBio
Not available
Platform unit IDplatform_unit_idThis is an identifier for lanes (Illumina), or for slides (SOLiD) in the case that a library was split and ran over multiple lanes on the flow cell or slides. The platform unit ID refers to the lane ID or the slide ID.This takes a string.
Paired endpaired_endFor paired-end sequencing, this value determines the end of the fragment sequenced.This takes a value of '1' or '2'. Please keep in mind that the data type of these values is string.
Note: For single-end sequencing, the field should be left as '-'.
File segment numberfile_segment_numberIf the sequencing reads for a single library, sample and lane are divided into multiple (smaller) files, the File segment number is used to enumerate these. Otherwise, this field can be left blank.This takes an integer.
Quality scalequality_scaleFor raw reads, this value denotes the sequencing technology and quality format. For BAM and SAM files, this value should always be ‘Sanger’.Choose from one of the following options:
sanger
llumina13
illumina15
illumina18
* solexa
Or, enter no value.
Experimental strategyexperimental_strategyThis is the method or protocol used to perform the laboratory analysis.This takes a string.
Suggested values:
DNA-Seq
WXS
WGS
Amplicon
Bisulfite-Seq
VALIDATION
RNA-Seq
miRNA-Seq
Total RNA-Seq
Genotyping Array
Exon Array
CGH Array
Methylation Array
Gene Expression Array
miRNA Array
Protein Expression Array
MSI- Mono- Dinucleotide Array
Not available
Reference genomereference_genomeThe reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned.This takes a string.

Suggested values:

NCBI36_BCCAGSC_variant
NCBI36_BCM_variant
NCBI36_WUGSC_variant
HG18
HG18_Broad_variant
GRCh37
GRCh37-lite
GRCh37_BI_Variant
GRCh37-lite-+-HPV_Redux-build
GRCh37-lite_WUGSC_variant_1
GRCh37-lite_WUGSC_variant_2
HG19
HG19_Broad_variant
HS37D5
* GRCh38

Sample

In the following table, you will find the name, description, and values of metadata fields for Sample. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.

NameAPI keyDescriptionValue
Sample IDsample_idA human readable identifier for a sample or specimen, which could contain some metadata information. A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc.This takes a string.
Sample typesample_typeThe type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc.This takes a string.

Suggested values:

Blood Derived Normal
Buccal Cell Normal
Primary Blood Derived Cancer - Peripheral Blood
Recurrent Blood Derived Cancer - Peripheral Blood
Primary Tumor
Recurrent Blood Derived Cancer - Bone Marrow
Recurrent Tumor
Solid Tissue Normal
Metastatic
Additional - New Primary
Additional Metastatic
Human Tumor Original Cells
Primary Blood Derived Cancer - Bone Marrow
Cell Lines
Xenograft Tissue
Bone Marrow Normal
Fibroblasts from Bone Marrow Normal
Not available
Sample UUIDsample_uuidA unique identifier for the sample or specimen used in the investigation, such as a Universally Unique Identifier (UUID). A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc.This takes a string.

Aliquot

In the following table, you will find the name, description, and values of metadata fields for Aliquot. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.

NameAPI keyDescriptionValue
Aliquot ID*aliquot_idA human readable identifier for an aliquot, which may contain metadata information. The aliquot is a product or unit extracted from a sample of a specimen and prepared for the analysis.This takes a string.
Aliquot UUIDaliquot_uuidThe unique identifier for an aliquot, such as a Universally Unique Identifier (UUID). The aliquot is a product or unit extracted from a sample of a specimen and prepared for the analysis.This takes a string.

Case

In the following table, you will find the name, description, and values of metadata fields for Case. The Case category is further subdivided by the following properties: Diagnosis, Demographic, Status, and Prognosis. These properties are included in italics below the metadata field's name in the first column. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.

NameAPI keyDescriptionValue
Case IDcase_idAn identifier, such as a number or a string that may contain metadata information, for a subject who has taken part in the investigation of study.This takes a string.
Case UUIDcase_uuidAn unique identifier, such as a Universally Unique Identifier (UUID), for a subject who has taken part in the investigation of study.This takes a string.
Primary site
(Diagnosis)
primary_siteThe anatomical site where the primary tumor is located in the organism.This takes a string.

Suggested values:
Adrenal Gland
Bile Duct
Bladder
Blood
Brain
Breast
Cervix
Colorectal
Esophagus
Eye
Head And Neck
Liver
Lung
Lymph Nodes
Kidney
Mesenchymal
Mesothelium
Nervous System
Ovary
Pancreas
Prostate
Skin
Stomach
Uterus
Testis
Thymus
Thyroid
Not available
Disease type
(Diagnosis)
disease_typeThe type of the disease or condition studied.This takes a string.

Suggested values:
Acute Myeloid Leukemia
Adrenocortical Carcinoma
Bladder Urothelial Carcinoma
Brain Lower Grade Glioma
Breast Invasive Carcinoma
Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma
Cholangiocarcinoma
Chronic Myelogenous Leukemia
Colon Adenocarcinoma
Esophageal Carcinoma
Glioblastoma Multiforme
Head and Neck Squamous Cell Carcinoma
Kidney Chromophobe
Kidney Renal Clear Cell Carcinoma
Kidney Renal Papillary Cell Carcinoma
Liver Hepatocellular Carcinoma
Lung Adenocarcinoma
Lung Squamous Cell Carcinoma
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma
Mesothelioma
Ovarian Serous Cystadenocarcinoma
Pancreatic Adenocarcinoma
Pheochromocytoma and Paraganglioma
Prostate Adenocarcinoma
Rectum Adenocarcinoma
Sarcoma
Skin Cutaneous Melanoma
Stomach Adenocarcinoma
Testicular Germ Cell Tumors
Thymoma
Thyroid Carcinoma
Uterine Carcinosarcoma
Uterine Corpus Endometrial Carcinoma
Uveal Melanoma
* Not available
Gender
(Demographic)
genderThe collection of behaviors and attitudes that distinguish people on the basis of societal roles expected for the two sexes.Choose from the following:
Female
Male
Age at diagnosis
(Diagnosis )
age_at_diagnosisThe age in years of the case at the initial pathological diagnosis of disease or cancer.This takes a non-negative integer.
Vital status
(Status)
vital_statusThe state of being living or deceased for cases that are part of the investigation.Choose from the following:
Alive
Dead
Lost to follow-up
Unknown
* Not available
Days to death
(Prognosis)
days_to_deathA value denoting the project or study that generated the data.This takes a non-negative integer.
Race
(Demographic)
raceThe number of days from the date of the initial pathological diagnosis to the date of death for the case in investigation.This takes a string.

Suggested values:
White
American Indian or Alaska Native
Black or African American
Asian
Native Hawaiian or other Pacific Islander
Not reported
* Not available
Ethnicity
(Demographic)
ethnicityA socially defined category of people based on common ancestral, cultural, biological, and social factors.This takes a string.

Suggested values:
Hispanic or Latino
Not Hispanic or Latino
Not reported
Not Available

General

In the following table, you will find the name, description, and values of metadata fields for General. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.

NameAPI keyDescriptionValue
InvestigationInvestigationA value denoting the project or study that generated the data.This takes a string.