CPTAC data

About CPTAC

The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. Launched in 2011, CPTAC pioneered the integrated proteogenomic analysis of colorectal, breast and ovarian cancer to reveal new insights into these cancer types, such as identification of proteomic-centric subtypes, prioritization of driver mutations by correlative analysis of copy number alterations and protein abundance, and understanding cancer-relevant pathways through posttranslational modifications. 1

CPTAC Objectives

The overall objective of CPTAC is to systematically identify proteins that derive from alterations in cancer genomes and related biological processes, in order to understand the molecular basis of cancer that is not fully elucidated or not possible through genomics and to accelerate the translation of molecular findings into the clinic.  This is to be achieved through enhancing our understanding of cancer genome biology by adding a complementary functional layer of protein biology (a “proteogenome” approach) that refines/prioritizes driver genes, enhances understanding of pathogenesis through proteomic subtyping, illuminates dynamic alterations in posttranslational modifications responsible for the dysregulation of cancer signaling networks and pathways, and improves understanding of drug response and resistance to therapies. 2

The CPTAC analyzes cancer biospecimens from genomics initiatives such as The Cancer Genome Atlas (TCGA) by mass spectrometry to characterize and quantify their constituent proteins or “proteome”. Mass spectrometry enables the highly specific identification of proteins and proteoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and the localization of post-translational protein modifications (such as phosphorylation) on a protein’s sequence. 

Sets of data from different phases of the CPTAC initiative are currently available via the Data Browser on the CGC. Those are:

Apart from the Data Browser on the CGC, the aforementioned data is also available on the CPTAC Data Portal and the PDC.

CPTAC Data

The data labelled as CPTAC in the Data Browser originates from phase 2 of the CPTAC initiative (CPTAC-2) and can also be searched at its primary location, which is PDC, under CPTAC2 Retrospective. The mass spectrometry (MS) data in this dataset was imported to the CGC in 2017 and consists of four TCGA cancer types (Ovarian serous cystadenocarcinoma, Breast invasive carcinoma, Colon adenocarcinoma, Rectum adenocarcinoma) that are included in the CPTAC public project.

Learn more about the metadata associated with CPTAC data on the CGC.

CPTAC-3 Data

Data from the CPTAC-3 project is available on the CGC for search and filtering in the Data Browser. This set contains WGS, WXS, and RNA-Seq data that is either open or controlled (access to it requires approval from dbGaP). The data has been collected within the CPTAC (Clinical Proteomic Tumor Analysis Consortium) program, in the third phase labelled as CPTAC-3. The program was focused on collection of proteomics data for patients with a particular cancer type, but the data collection was also expanded to genomic data, particularly for Lung adenocarcinoma, Clear cell renal cell carcinoma and Uterine corpus endometrial carcinoma. The primary source for the genomic data is at the GDC, while the proteomic part of the CPTAC-3 project is hosted on the CPTAC Data Portal, under CPTAC3 and on the PDC as CPTAC3 Discovery.

Learn more about the metadata associated with CPTAC-3 data on the CGC.

Access CPTAC and CPTAC-3 Data

Access CPTAC and CPTAC-3 files via the Data Browser. CPTAC data (excluding CPTAC-3) can also be accessed through the CPTAC dataset public project.