TCGA metadata
ABOUT METADATA FOR DATASETS > TCGA metadata
Overview
Metadata is data that describes other data. On this page, we've detailed TCGA metadata that are available for viewing and filtering TCGA data in the Data Browser and the Datasets API. TCGA metadata on the CGC consists of properties which describe the entities of the TCGA dataset.
Entities are particular resources with UUIDs, such as files, cases, samples, and cell lines.
Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's vital status, gender, data format, or experimental strategy.
Entities for TCGA
The following are entities for TCGA. They represent clinical data, biospecimen data, and data about TCGA files. Learn more about TCGA data.
- cases
- analytes
- radiation_therapies
- drug_therapies
- follow_ups
- portions
- aliquots
- samples
- slides
- files
- new_tumor_events
Below, each of these entities is followed by a table of their related properties.
Case
The Case entity represents TCGA cases. Members of the Case entity can be identified by a Universally Unique Identifier (UUID). 35 properties of the TCGA Case have been included in the CGC for filtering Cases. These properties describe clinical information about a case, such as its demographic, prognosis diagnosis. See the table below for the clinical properties and descriptions of the Case entity.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Program | The research program under which the data was generated. See NCI Thesaurus Code: C82662. |
Investigation | A value denoting the project or study that generated the data. See NCI Thesaurus Code: C41198. |
Batch number | A set of related analytes prepared for further analysis, numbered sequentially, from the same disease. Once a Case has been assigned to a batch, subsequent shipments from that case are assigned the same batch number as the original. |
Gender | The collection of behaviors and attitudes that distinguish people on the basis of the societal roles expected for the two sexes. See NCI Thesaurus Code: C17357. |
Race | A classification of humans characterized by certain heritable traits, common history, nationality, or geographic distribution. See NCI Thesaurus Code: C17049. |
Ethnicity | A socially defined category of people based on common ancestral, cultural, biological, and social factors. See NCI Thesaurus Code: C29933. |
Disease type | The type of the disease or condition studied. See NCI Thesaurus Code: C2991. |
Primary site | The anatomical site where the primary tumor is located in the organism. See NCI Thesaurus Code: C43761. |
Histologic diagnosis | Diagnosis of a disease based on the type of tissue, where type is determined based on the microscopic examination of tissue. See NCI Thesaurus Code: C61478. |
Other histologic diagnosis | Additional options for histologics diagnosis (see Histologic diagnosis), which have not been pre-determined in the listed values for histologic diagnosis. |
Age at diagnosis | The age in years of the case at the initial pathological diagnosis of disease or cancer. See NCI Thesaurus Code: C15220. |
ICD-10 | This value denotes the classification of the disease according to the tenth version of the International Classification of Diseases (ICD), published by the World Health Organization in 1992. See NCI Thesaurus Code: C71892. |
ICD-O-3 Site | The topography code which describes the anatomical site of origin of the neoplasm according to the third edition of the International Classification of Diseases for Oncology (ICD-O). See NCI Thesaurus Code: C37978. |
ICD-O-3 Histology | The morphology code which describes the characteristics of the tumor itself, including its cell type and biologic activity, according to the third edition of the International Classification of Diseases for Oncology (ICD-O). |
Clinical stage | Estimate of the extent of the cancer based on results of physical exams, imaging tests (x-rays, CT scans, etc.), and tumor biopsies. |
Pathologic stage | Pathologic staging combines the results of both the clinical staging (physical exam, imaging test), see Clinical stage, with surgical results. NCI Thesaurus Code: C28257. |
Clinical T (TNM) | The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The T category describes the original (primary) tumor. NCI Thesaurus Code: C48881 and C253840.1321. |
Clinical N (TNM) | The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The N category describes whether or not the cancer has reached nearby lymph nodes NCI Thesaurus Code: C48881 and C25384. |
Clinical M (TNM) | The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The M category tells whether there are distant metastases (spread of cancer to other parts of the body). NCI Thesaurus Code: C48881 and C25385. |
Pathologic T (TNM) | The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The T category describes the original (primary) tumor. NCI Thesaurus Code: C48881 and C48739. |
Pathologic N (TNM) | The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The N category describes whether or not the cancer has reached nearby lymph nodes NCI Thesaurus Code: C48881 and C48740. |
Pathologic M (TNM) | The TNM Staging System is based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). The M category tells whether there are distant metastases (spread of cancer to other parts of the body). NCI Thesaurus Code: C48881 and C48741. |
Performance Status Score: Karnofsky Score | An index designed for classifying patients 16 years of age or older by their functional impairment. A standard way of measuring the ability of cancer patients to perform ordinary tasks. NCI Thesaurus Code: C28013. |
Performance Status Score: ECOG | A performance status scale designed to assess disease progression and its affect on the daily living abilities of the patient. NCI Thesaurus Code: C105721. |
Performance Status Score: Timing | A time reference for the Karnofsky score and/or the ECOG score using the defined categories. |
Days to death | The number of days from the date of the initial pathological diagnosis to the date of death for the case in the investigation. |
Prior diagnosis | Informs whether a case has a known history of an earlier diagnosis of disease/cancer. |
New tumor event after initial treatment | A boolean value which denotes whether a neoplasm developed after the initial treatment has finished. |
New tumor event type | Type of newly developed neoplasm after initial treatment has finished. |
New tumor anatomic site | Anatomic site of newly developed neoplasm. |
Other new tumor anatomic site | Alternative anatomic site of a newly developed neoplasm which has not been listed under "New tumor anatomic site". |
Tumor status | The condition or state of the tumor at a particular time. See NCI Thesaurus Code: C96643. |
Vital status | The state of being living or deceased for cases that are part of the investigation. See NCI Thesaurus Code: C25717. |
Primary therapy outcome success | A value denoting the result of therapy for a given disease or condition in a patient or group of patients. See NCI Thesaurus Code: C18919. |
Drug therapy, Radiation therapy, and Follow up are dependents on the Case entity. This means, to query Drug therapy, Radiation therapy, or Follow up, you have to build your query starting from Case.
Drug therapy
Drug therapy is the pharmaceutical product that contains one or more active and/or inactive ingredients. It is intended to treat, prevent or alleviate the symptoms of disease. A case can have more than one drug treatment that can be identified by a UUID.
ntity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Drug name | The most recognizable term associated with a pharmaceutical product used to prevent, diagnose, treat or relieve symptoms of a disease or abnormal condition. NCI Thesaurus Code: C97104. |
Pharmaceutical therapy type | The type of treatment of the disease through the use of drugs. NCI Thesaurus Code: C15986. |
Radiation therapy
The radiation therapy entity represents the treatment of a disease with radiation therapy, in which the whole or a portion of the patient's body is exposed to radiation. Members of the radiation therapy entity can be defined by a UUID. Note that a case can have more than one radiation treatment. Find the properties of the radiation therapy entity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Radiation type | The value denotes the type of high-energy radiation used to kill cancer cells and shrink tumors. NCI Thesaurus Code: C15986. |
Radiation therapy site | The location to which radiation therapy was administered. |
Follow up
The follow up entity refers to follow ups which monitor a person's health over time after treatment. Members of the follow up entity can be identified by a UUID. A case can have multiple follow ups generated at different time. Find the properties of the follow up entity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Days to last follow up | Time interval from the date of last follow up to the date of initial pathologic diagnosis, represented as a calculated number of days. |
New tumor event after initial treatment | A boolean value which denotes whether a neoplasm developed after the initial treatment has finished. |
New tumor event type | Type of newly developed neoplasm after initial treatment has finished. |
New tumor anatomic site | Anatomic site of newly developed neoplasm. |
Other new tumor anatomic site | Alternative anatomic site of a newly developed neoplasm which has not been listed under "New tumor anatomic site". |
Tumor status | The condition or state of the tumor at a particular time. See NCI Thesaurus Code: C96643. |
Vital status | The state of being living or deceased for cases that are part of the investigation. See NCI Thesaurus Code: C25717. |
Sample
The sample entity represents samples or specimen material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. For instance, samples include tissues, body fluids, cells, organs, embryos, and body excretory products. Members of the sample entity can be identified by a UUID. Find the properties of the sample entity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Sample type | The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713. |
Sample type code | Code that determines type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713. |
Tissue source site | A clinical site that collects and provides patient samples and clinical metadata for research use. See NCI Thesaurus Code: C103264. |
Tissue source site code | Alphanumeric code for a clinical site that collects and provides patient samples and clinical metadata for research use. See NCI Thesaurus Code: C103264. |
Country of specimen procurement | Country where specimen/sample has been procured. |
Longest dimension | The longest dimension of sample/specimen (in centimeters). |
Intermediate dimension | The intermediate dimension of sample/specimen (in centimeters). |
Shortest dimension | The shortest dimension of sample/specimen (in centimeters). |
Initial weight | Initial sample/specimen weight (in grams). |
Current weight | Current sample/specimen weight (in grams). |
Freezing method | The freezing method for sample/specimen. |
OCT embedded | A boolean value indicating whether the Optimal Cutting Temperature compound (OCT) is used to embed tissue samples prior to frozen sectioning on a microtome-cryostat. |
Time between clamping and freezing | Time elapsed (in minutes) between clamping (supplying vessel) and freezing a sample. |
Time between excision and freezing | Warm ischemia time, elapsed between clamping and freezing a sample, as denoted in minutes. |
Days to collection | Days to sample collection. Sample can be collected can be prospectively or retrospectively. This can be a negative value for samples taken retrospectively. |
Days to sample procurement | Number of days from the date the patient was initially diagnosed pathologically with the disease to the date of the procedure that produced the malignant sample for submission. |
Is FFPE | A boolean value that denotes whether tissue samples used in the analysis were formalin-fixed paraffin-embedded (FFPE). |
Portion
The portion entity represents the sequential 100-120 mg sections derived from samples. Members of the portion entity can be identified by a UUID. Find the properties of the portion entity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Portion number | The numerical value that represents the order of a portion in the series. |
Weight | Weight of a portion prepared for the analysis (in mg). |
Is FFPE | A boolean value that denotes whether tissue samples used in the analysis were formalin-fixed paraffin-embedded (FFPE). |
Slide
The slide entity refers to slides, thin slices of a snap-frozen OCT embedded block of tissue sent for imaging. This same tissue also provides DNA and RNA for further analyses after they are reviewed by histopathologists. Members of the slide entity can be identified by a UUID. Find the properties of the slide entity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Section location | The section of a tissue that has been imaged. The value denotes top, middle, or bottom. |
Number proliferating cells | The number of proliferating cells based on the tissue image. |
Percent tumor cells | The percent of tumor cells based on the tissue image. |
Percent tumor nuclei | The percent of tumor nuclei based on the tissue image. |
Percent normal cells | The percent of normal cell based on the tissue image. |
Percent necrosis | The percent of normal cell based on the tissue image. |
Percent stromal cells | The ratio of stromal cells present on the tissue slide. |
Percent inflam infiltration | The ratio of inflammatory cells to the gross cell population seen on a slide. |
Percent lymphocyte infiltration | The fraction of lymphocyte cells to the gross inflammatory cells seen on a slide. |
Percent monocyte infiltration | The fraction of monocyte cells to the gross inflammatory cells seen on a slide. |
Percent granulocyte infiltration | The fraction of the granulocyte component to the gross inflammatory cells seen on a slide. |
Percent neutrophile infiltration | The fraction of neutrophile cells to the gross granulocyte component of inflammatory cells seen on a slide. |
Percent eosinophil infiltration | The fraction of eosinophil cells to the gross granulocyte component of inflammatory cells seen on a slide. |
Analyte
The analyte entity represents the analytes or molecules, such as DNA or RNA, used for analyses. An analyte is a molecular specimen extracted for analysis from a portion using a specific extraction protocol. Members of the analyte entity can be identified by a UUID. Find the properties of the analyte entity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Analyte type | Defines the type of an analyte on molecular bases. |
Analyte type code | Determines the analyte type with a code. |
Amount | Amount of a product (in μg) prepared for an analysis. |
Concentration | Concentration of a product (in μg/μL) prepared for an analysis. |
a260_a280 ratio | A numerical value denoting purity assessment using the A260/A280 Ratios. |
Well number | The number of wells on the plate in which an analyte has been stored for shipment and for the analysis. |
Spectrophotometer method | A method of quantifying the content of nucleic acids in any sample. |
Aliquot
The aliquot entity refers to aliquots, products or units extracted from a sample or specimen 's portion and prepared for analysis. Members of the aliquot entity can be identified by a UUID. Find the properties of the aliquot entity below.
Property | Description |
---|---|
ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. |
Amount | Amount of a product (in μg) prepared for an analysis. |
Concentration | Concentration of a product (in μg/μL) prepared for an analysis. |
Files
The file entity refers to the files in TCGA produced by aliquot analyses. Files with Controlled Data are stored in the CGHub in xml, and files with Open Data are stored in the DCC in mage-tab. Members of the file entity can be identified by a UUID. Find the properties of the file entity below.
Property | Description |
---|---|
GDC File UUID | The unique identifier for a file, such as a Universally Unique Identifier (UUID). |
File size | Size of a file measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB), and larger values. |
Data format | The type of format that determines data content. |
Experimental strategy | The method or protocol used to perform the laboratory analysis. See NCI Thesaurus Code: C43622. |
Platform | The version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378. |
Data type | The classification of data used in (or produced by) the analysis, based on its form and content. See NCI Thesaurus Code: C42645. |
Data subtype | A further, more specific classification of the data type, based on the information that it contains. |
Data submitting center | This field takes a string denoting the name of the center that has submitted data. |
Reference genome | The reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned. |
Access level | A Boolean value indicating Controlled Data or Open Data. Controlled Data is data from public datasets that has limitations on use and requires approval by dbGaP. Open Data is data from public datasets that doesn't have limitations on its use. |
Updated about 2 years ago