Projects on the CGC

Overview

Projects are the core building blocks of the CGC. Each project corresponds to a distinct scientific investigation and serves as a container for its data, analysis workflows, and results. Multiple workflow executions can be carried out within a project.

Access to a project is restricted to the collaborators in the investigation. Each project has at least one administrator, who controls the project members' permissions to execute analyses.

For example, you may be involved in two projects: the first is a breast cancer study using RNA-Seq and second is a whole genome sequencing study of prostate cancer. Each of these projects can involve different teams of researchers. See our documentation for more about collaborating on the CGC.

Project types on the CGC

The CGC hosts both Open Data and Controlled Data, which require different levels of access permissions. To protect Controlled Data, there are 2 types of projects on the CGC: Open Data and Controlled Data projects. Note that you can always upload your private data into either of these project types.

Open Data Projects

Open Data Projects are designed to host both Open Data and your private data.

Open Data is available to all the users on the CGC upon sign up. Open Data contains data which is not unique to an individual, such as de-identified clinical data, gene expression data, copy number alterations in regions of the genome, epigenetic data, and summaries of data compiled across individuals. Learn more about Open data on the CGC.

Note that you cannot copy Controlled Data, such as TCGA Controlled Data and TARGET Controlled Data, inside an Open Data Project.

Controlled Data Projects

Controlled Data Projects host both Open and Controlled Data as well as your private data.

Access to Controlled Data must be obtained through dbGaP. After obtaining permission, Controlled Data users need to register for the CGC with their eRA Commons credentials and agree to the data use and publication guidelines for all relevant datasets. Learn more about signing up for the CGC or about dbGaP controlled data access on the CGC.

Controlled Data contains data which may allow individuals to be identified, such as primary sequence data (BAM and FASTQ files), SNP6 array level 1 and 2 data, exon array level 1 and 2 data, and VCFs. The CGC restricts access to Controlled Data following dbGaP's model. This security ensures that data is as widely available as possible while protecting the privacy of study participants. Only users with Database of Genotypes and Phenotypes (dbGaP) permissions can access Controlled Data on the CGC.

Controlled Data Projects are labeled CONTROLLED with a red tag and a lock symbol so you can recognize them easily.

If a collaborator lose dbGaP Controlled Data access at any point, all Controlled Data Project resources will become read-only: they can see project resources and file metadata but cannot access and copy files or execute analyses. Read more about collaboration on the CGC.

Project locations

The CGC currently works with three cloud providers: Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). Learn more about project locations.

Updated 5 months ago