File repositories on the CGC

Before you can analyze data on the CGC, the files need to be within a specific project. In other words, an analysis can't be performed on the data which is outside of the project. You can copy files from a CGC file repository to your project, or you can upload your own data.

For instance, if you'd like to use an annotation file from the Public Reference Files repository for a task execution, you first need to copy it to your project.

There are several file repositories on the CGC.

  • Every project has its own Project Files. This repository is located within the project and contains the input and output files for workflows in that project. You can upload files or copy them from other projects and repositories.

  • The CGC hosts TCGA data, which can be accessed via the Case Explorer and the Data Browser from the Data tab on the top navigation bar.

  • Public Reference Files, a repository of files maintained by the CGC, contains the latest and most frequently used reference genomes and annotation files so you won't have to upload your own reference files every time you run a task. Many bioinformatics tools and workflows require reference and annotation files to work properly. Files stored in this repository can be copied to your Project Files for use in analyses.

  • Public Test Files, also a repository of files maintained by the Seven Bridges, which contains the common test samples.

You can copy files from any file repositories to your project. Or, you can upload your data directly to a project. For instance, if you'd like to use a file from the Public Reference Files or Public Test Files repository, you first need to copy it to your project. You can access these repositories from the Data menu.