TCIA data

Overview

The Cancer Imaging Archive (TCIA) contains radiological imaging data from The Cancer Genome Atlas (TCGA) and is part of an effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects. TCIA includes radiological images which represent 21 types of cancer detailed in TCGA. All images are accessible for public use. These images are de-identified to ensure that images are free of protected health information (PHI), and are stored in a standard DICOM format.

Distribution of the data

See below for an overview of the number of subjects and the image modalities (such as MRI or CT) of the data, grouped by different cancer types (“Collections”). See a full list of cancer type abbreviations and a full list of DICOM image modality abbreviations.

CollectionSubjectsModalities
TCGA-KIRC267CT, MR, CR
TCGA-GBM262MR, CT, DX
TCGA-LGG199MR, CT
TCGA-HNSC192CT, MR, PT, RTSTRUCT, RTPLAN, RTDOSE
TCGA-OV143CT, MR
TCGA-BRCA139MR, MG
TCGA-BLCA97CT, CR, MR, PT
TCGA-LIHC97MR, CT, PT
TCGA-LUAD69CT, PT, NM
TCGA-UCEC58CT, CR, MR, PT
TCGA-CESC54MR
TCGA-STAD46CT
TCGA-LUSC37CT, NM, PT
TCGA-KIRP33CT, MR, PT
TCGA-COAD25CT
TCGA-ESCA16CT
TCGA-KICH15CT, MR
TCGA-PRAD14CT, PT, MR
TCGA-THCA6CT, PT
TCGA-SARC5CT, MR
TCGA-READ3CT, MR

TCIA Metadata

Each TCIA file on the CGC contains a set of images acquired during the same scanning mode in a compressed file format. The following metadata are also set for each file when available:

PropertyDescription
Case UUIDA Universally Unique Identifier (UUID) for the sample or files of a case.
Case IDA human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID.
EthnicityA socially defined category of people based on common ancestral, cultural, biological, and social factors. See NCI Thesaurus Code: C29933.
GenderThe collection of behaviors and attitudes that distinguish people on the basis of the societal roles expected for the two sexes. See NCI Thesaurus Code: C17357.
RaceA classification of humans characterized by certain heritable traits, common history, nationality, or geographic distribution. See NCI Thesaurus Code: C17049.
InvestigationA value denoting the project or study that generated the data. See NCI Thesaurus Code: C41198.
Age at diagnosisThe age in years of the case at the initial pathological diagnosis of disease or cancer. See NCI Thesaurus Code: C15220.
Primary siteThe anatomical site where the primary tumor is located in the organism. See NCI Thesaurus Code: C43761.
Disease typeThe type of the disease or condition studied. See NCI Thesaurus Code: C2991.
Vital statusThe state of being living or deceased for cases that are part of the investigation. See NCI Thesaurus Code: C25717.
Days to deathThe number of days from the date of the initial pathological diagnosis to the date of death for the case in the investigation.
Series dateDate the Series was acquired.
ManufacturerManufacturer's name of the equipment that produced the composite instances.
Body part examinedText description of the part of the body examined.
ModalityType of equipment that originally acquired the data.
Protocol nameUser-defined description of the conditions under which the Series was performed.
Manufacturer model nameManufacturer's model name of the equipment that produced the composite instances.
Series descriptionUser provided description of the Series.
Software versionsManufacturer's designation of software version of the equipment that produced the composite instances.
Image countNumber of images in this series.

Access TCIA data

Access a repository of TCIA files via the TCIA public project or the Data Browser.