Import data from the GDC

About the GDC

The Genomic Data Commons (GDC) is a research program of the National Cancer Institute (NCI). The mission of the GDC is to provide the cancer research community with a unified repository and cancer knowledge base that enables data sharing across cancer genomic studies in support of precision medicine.

The National Cancer Institute, part of the National Institutes of Health (NIH), is the federal government's principal agency for cancer research and training. NCI’s mission is to lead, conduct, and support cancer research across the nation to advance scientific knowledge and help all people to live longer, healthier lives. NCI’s scope of work spans a broad spectrum of cancer research across a variety of disciplines and supports research training opportunities at career stages across the academic continuum.

The process of importing files from the GDC to the Cancer Genomics Cloud (CGC) consists of the following two stages:

  • Downloading a manifest file from the GDC website.
  • Importing files to the CGC based on the downloaded manifest file.

Downloading manifest files from the GDC

Manifest files that are downloaded from the GDC contain information about the data you want to import in the second stage of this process. The procedure below will describe one possible use case for defining filters and downloading the manifest.

To start, access the GDC website and follow these steps:

  1. Click the Cases tab.
  1. In the "Program" section, choose "CPTAC".
  2. In the "Project" section, choose "CPTAC-3".
  3. Click the Files tab to continue.
  1. For the "Experimental Strategy", select scRNA-Seq.
  2. For "Data Format", choose bam.
  3. Under "Access", choose controlled.
  4. Once you have defined these filters, you should see that there are 18 files and cases available. Click Manifest in the upper right corner.

A manifest file is downloaded to your computer. Please keep this file as it will be used in the following stage of the import process.

Import files from the GDC to the CGC

  1. Navigate to a project on the CGC.
  2. Once in the project, click the Files tab.
  3. Click Add files > Import from a manifest file.
  4. In the Import files from dropdown, select Genomic Data Commons (GDC).
  5. Click Browse files and select the manifest file from your local machine, or drag and drop the file onto the marked area. Alternatively, if you have already uploaded your generated manifest file to a project, click Select manifest from project and select the file.
  6. (Optional) In the Add tags field, add the keywords (tags) that describe the imported items.
  7. Resolve naming conflicts - Select the action to be taken if a naming conflict occurs. Available actions are Skip and Auto Rename. Read more about naming conflicts resolution.
  8. Click Import. The file import process starts.