ICGC metadata

Overview

Metadata is data that describes other data. On this page, we've detailed ICGC metadata that are available for viewing and filtering ICGC data in the Data Browser on the CGC. ICGC metadata on the CGC consists of properties which describe the entities of the ICGC dataset and their values.

Entities are particular resources with UUIDs, such as files, cases, samples, and cell lines.

Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's vital status, gender, data format, or experimental strategy.

The ICGC PCAWG Study dataset includes data from 20 different research projects conducted at participating centers around the world, and differences exist in the ontologies used across centers. Note that all metadata values assigned by ICGC research projects are provided via the CGC without modification. When identifying patient cohorts for further study, researchers are encouraged to investigate the full set of available metadata values to ensure that queries return all relevant Cases, Samples, or similar.

Entities for ICGC

The following are entities for ICGC. They represent clinical data, biospecimen data, and data about ICGC files. Learn more about ICGC Data.

  • donor
  • exposure
  • family
  • file
  • project 
  • sample
  • specimen
  • surgery
  • therapy

Below, each of these entities is followed by a table of their related properties.

Donor

The ICGC donor entity represents the subject who has taken part in the investigation/program. Members of the donor entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the donor entity below. Note that once you copy an ICGC file into a project on the CGC, metadata information pertaining to the donor entity will display under the case label on the file's page.

PropertyDescription
Age at diagnosisAge at primary diagnosis in years.
Age at diagnosis groupAge at primary diagnosis group, range given in years.
Age at enrollmentAge (in years) at which first specimen was collected.
Age at last follow upAge (in years) at last followup.
Cancer type prior malignancyICD-10 diagnostic code for type of cancer in a prior malignancy.
Disease status at last followupDonor's last known disease status.
Donor analysis typeThe type of analysis performed on the donor's sample.
ICD-10 diagnostic codeICD-10 diagnostic code for donor.
GenderDonor's biological sex. 'Other' has been removed from the controlled vocabulary due to identifiability concerns.
History of first degree relativeIndicates if the patient has a first degree relative with cancer
Interval of last follow upInterval from the primary diagnosis date to the last followup date, in days. ICGC requests that patients be followed up every 6 months while alive.
Primary SiteThe anatomical site where the primary tumour is located in the organism.
Prior MalignancyPrior malignancy affecting patient.
Relapse intervalIf the donor was clinically disease free following primary therapy, and then relapse or progression (for liquid tumours) occurred afterwards, then donor_relapse_interval is the length of disease free interval, in days.
Relapse typeType of relapse or progression (for liquid tumours), if applicable.
Submitted donor IDUsually a human-readable identifier, such as a number or a string that may contain metadata information.
Survival timeHow long has the donor survived since primary diagnosis, in days.
Tumour stage at diagnosisThis is the pathological tumour stage classification made after the tumour has been surgically removed, and is based on the pathological results of the tumour and other tissues removed during surgery or biopsy. This information is not expected to be the same as donor's tumour stage at diagnosis since the pathological tumour staging information is the combination of the clinical staging information and additional information obtained during surgery. For this field, please indicate pathological tumour stage value using indicated staging system.
Tumour stage supplementalOptional additional staging at the time of diagnosis.
Tumour staging system at diagnosisClinical staging system used at time of diagnosis, if determined. This is supplementary to specimen’s pathological staging.
Vital statusDonor's last known vital status.
StateIndicates the state of the donor.
StudyThe study the donor is involved in.

Exposure

The exposure entity represents details about a donor's antecedent environmental exposures, such as smoking history. See the table below for the clinical properties and descriptions of the exposure entity.

PropertyDescription
Alcohol historyA response to the question that asks whether the participant has consumed at least 12 drinks of any kind of alcoholic beverage in their lifetime. See CDE (Common Data Element) Public ID: 2201918. Also: A description of an individual's current and past experience with alcoholic beverage consumption. See NCI Thesaurus Code: C81229.
Alcohol history intensityA category to describe the patient's current level of alcohol use as self-reported by the patient. See CDE (Common Data Element) Public ID: 3457767.
Exposure intensityExtent of the exposure. Use this field to specify intensity of exposure submitted in 'Exposure type' field.
Exposure typeType of exposure. This field can be used if the donor was exposed to something other than tobacco or alcohol.
Tobacco smoking history indicatorDonor's smoking history.
Tobacco smoking intensitySmoking intensity in Pack Years: Number of pack years defined as the number of cigarettes smoked per day times (x) the number of years smoked divided (/) by 20.

Family

The family entity represents details of the family history of the donor. Find the properties of the family entity below.

PropertyDescription
Relationship ageAge of the donor's relative at primary diagnosis (in years).
Relationship diseaseName of the donor'zs relative's disease.
Relationship disease ICD-10ICD-10 code of disease affecting family member specified in the 'relationship type' field.
Relationship sexBiological sex of the donor's relative
Relationship typeRelationship to the donor, which can be parent, sibling, grandparent, uncle/aunt, cousin, other or unknown.
Relationship type otherRelationship to the donor, if the relationship type is ‘other’.
Relative with cancer historyIndicates whether the donor has a relative with a history of cancer.

File

The file entity represents the data files generated as part of this study. Members of the file entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the file entity below.

PropertyDescription
File analysis typeThe type of analysis applied to the sample from the donor.
Experimental strategyThe method or protocol used to perform the laboratory analysis. See NCI Thesaurus Code: C43622.
Genome buildThe reference genome or assembly (such as HG19/GRCh37 or GRCh38) to which the nucleotide sequence of a case/subject/sample can be aligned.
File sizeThe size of a file measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB), and larger values.
StudyThe study the donor is involved in.
Access levelA Boolean value indicating Controlled Data or Open Data. Controlled Data is data from public datasets that has limitations on use and requires approval. Open Data is data from public datasets that doesn't have limitations on its use.
File nameFIle name.
External file IDAn identifier pointing to an external file.
External object IDAn identifier pointing to an external object.

Project

The project entity represents the project that generated the data. Members of the project entity can be identified by a Project Identifier which is generated from the project name (e.g. Breast Triple Negatice/Lobular Cander - UK BRCA-UK).

Find the properties of the project entity below. Note that once you copy an ICGC file into a project on the CGC, metadata information pertaining to the project entity will display under the investigation label on the file's page.

PropertyDescription
Partner countryPartner country of the cancer project.
Primary countryLead country of the cancer project.
Primary siteThe anatomical site where the primary tumour is located in the organism.
Project nameName of the project which generated the data.
Pubmed IDID of the publication at www.ncbi.nlm.nih.gov/pubmed/.
StateIndicates the state.
Tumour typeThe type of the cancer studied.
Tumour subtypeInformation about tumour type.

Sample

The sample entity represents samples or specimen material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. For instance, samples include tissues, body fluids, cells, organs, embryos, and body excretory products. Members of the sample entity can be identified by a Universally Unique Identifier (UUID). Find the properties of the sample entity below.

PropertyDescription
Submitted sample IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. Note that once you copy an ICGC file into a project on the CGC, metadata information pertaining to the Sample ID property will display under the Aliquot Sample ID and Portion Sample ID labels on the file's page.
Analyzed sample intervalInterval from specimen acquisition to sample use in an analytic procedure (e.g. DNA extraction), in days.
StudyStudy donor is involved in.
Level of cellularityThe proportion of tumour nuclei to total number of nuclei in a given specimen/sample. If exact percentage cellularity cannot be determined, the submitter has the option to use this field to specify a level that defines a range of percentage
Percentage of cellularityThe ratio of tumour nuclei to total number of nuclei in a given specimen/sample.

Specimen

The specimen entity represents information about a specimen that was obtained from a donor. There may be several specimens per donor that were obtained concurrently or at different times. Find the properties of the specimen entity below.

PropertyDescription
Digital image of stained sectionLinkout(s) to digital image of a stained section, demonstrating a representative section of tumour.
Level of cellularityThe proportion of tumour nuclei to total number of nuclei in a given specimen/sample. If exact percentage cellularity cannot be determined, the submitter has the option to use this field to specify a level that defines a range of percentage.
Percentage of cellularityThe ratio of tumour nuclei to total number of nuclei in a given specimen/sample.
Submitted specimen IDUsually a human-readable identifier, such as a number or a string that may contain metadata information. In some instances, this can also be a UUID. Note that once you copy an ICGC file into a project on the CGC, metadata information pertaining to the Submitted specimen ID property will display under the Sample Submitter ID label on the file's page.
Specimen availableWhether additional tissue is available for followup studies.
Specimen biobankIf the specimen was obtained from a biobank, provide the biobank name here.
Specimen biobank IDIf the specimen was obtained from a biobank, provide the biobank accession number here.
Specimen processingDescription of technique used to process specimen.
Specmen processing otherIf other technique specified for specimen processing, may indicate technique here.
Specimen intervalInterval (in days) between specimen acquisition both for those that were obtained concurrently and those obtained at different times.
Specimen storageDescription of how the specimen was stored.
Specimen storage otherIf other types of storage are specified for specimen storage, may indicate technique here.
Specimen typeControlled vocabulary description of specimen type.
Specimen type otherFree text description of the specimen type.
Treatment typeType of treatment the donor received prior to specimen acquisition.
Treatment type otherFreetext description of the treatment type.
Tumour confirmedWhether tumour was confirmed in the specimen as malignant by histological examination.
Tumour gradeTumour grade using indicated grading system.
Tumour grading systemName of the tumour grading system.
Tumour stage supplementalOptional additional staging. For donor, it should be at the time of diagnosis.
Tumour histological typeWHO International Histological Classification of Tumours code.
Tumour stageThis is the pathological tumour stage classification made after the tumour has been surgically removed, and is based on the pathological results of the tumour and other tissues removed during surgery or biopsy.

This information is not expected to be the same as the donor's tumour stage at diagnosis since the pathological tumour staging information is the combination of the clinical staging information and additional information obtained during surgery.

For this field, please indicate pathological tumour stage value using the indicated staging system.
Tumour stage supplementalOptional additional staging.
Tumour staging systemNam e of the tumour staging system used.

Surgery

The surgery entity represents details about surgical procedures undergone by the donor. Find the properties of the surgery entity below.

PropertyDescription
Procedure intervalInterval between primary diagnosis and procedure, in days.
Procedure siteAnatomical site of the procedure. This must use a standard controlled vocabulary which should be reported in advance to the DCC.
Procedure typeControlled vocabulary description of the procedure type. Vocabulary can be extended by disease-specific projects. Prefix extensions with 3-digit center code, e.g. 008.1 Beijing Cancer Hospital, fine needle aspiration of primary.
Resection statusOne of three possible categories that describes the presence or absence of residual tumour following surgical resection.

Therapy

The therapy entity represents details about the type and duration of the therapy the donor received. Find the properties of the therapy entity below.

PropertyDescription
First therapy durationDuration of first postresection therapy, in days.
First therapy responseThe clinical effect of the first postresection therapy.
First therapy start intervalInterval between primary diagnosis and initiation of the first postresection therapy, in days.
First therapy therapeutic intentThe therapeutic intent of the first postresection therapy.
First therapy typeType of first postresection therapy (i.e. therapy given to the patient after the sample was removed from the patient).
Other therapyOther postresection therapy.
Other therapy responseThe clinical effect of the other postresection therapy.
Second therapy durationDuration of second postresection therapy, in days.
Second therapy responseThe clinical effect of the second postresection therapy.
Second therapy start intervalInterval between primary diagnosis and initiation of the second postresection therapy, in days.
Second therapy therapeutic intentThe therapeutic intent of the second postresection therapy.
Second therapy typeType of second postresection therapy (ie. therapy given to the patient after the sample was removed from the patient).