Glossary

A . B . C . D . E . F . G . H . I . J . K . L . M . N . O . P . Q . R . S . T . U . V . W . X . Y . Z
Tip: Use the search function (Control+F or Command+F) to find specific terms.

A

Alias (volumes API)

An [alias](doc:aliases) is a pointer, on the Platform, that refers to a file in a cloud storage repository (AWS S3 bucket, or Google Cloud storage). Aliases enable files in cloud storage repositories outside the Platform to be manipulated on the Platform, for instance, to input them to computational tasks.

API

The [Seven Bridges API ](doc:the-api)(Application Programming Interface) is used to integrate the Seven Bridges Platform with other applications, and to automate most procedures on it, such as uploading files, querying metadata, and executing analyses.

API bindings

API bindings allow you to communicate with Seven Bridges Platform API using [Python](https://github.com/sbg/sevenbridges-python) and [R](https://github.com/sbg/sevenbridges-r).

App

Apps are bioinformatics tools and workflows. The CGC hosts many pre-built apps that are ready to use. They are put on the CGC and maintained by the Seven Bridges CGC team. These apps can be executed on the CGC as is.

Tip: Public apps are available on the CGC, but you can also bring your own tools and design your own workflows.

See also: projectpublic appstoolworkflow

Archiving

[Archiving](doc:file-archiving-overview) is a process of storing files which you don't intend to use for longer periods of time (typically over three months). Archived files are billed at a significantly reduced price compared to data which is always available.

B

Billing Group

Billing groups have separate billing charges and a billing group administrator who is a project member that manages the payments. Note that the billing group administrator need not be the project administrator. Billing group administrators may have read permissions only, so they cannot read file contents (only file metadata), write data or execute analyzes

Tip: During the evaluation period, you can get extra storage and computation credits when you bring your tools and/or data to the CGC.

See also: evaluation period, project member, tool

C

CGC Image Registry

The CGC image registry is a secure storage facility in the same data center as the CGC compute nodes. It is used to store Docker images containing tools. Keeping your images here rather than in Docker Hub maximizes the efficiency of the tools' setup and runtime when used on the CGC.

Controlled Data

Controlled Data on the CGC contains TCGA data with genomic information which may allow donors to be identified, such as primary sequence data (BAM and FASTQ files), VCFs, SNP array, Exon array, or certain information in MAFs.

Tip:

  1. Controlled Data users must obtain an approved Data Access Request through the Database of Genotypes and Phenotypes (dbGaP) and agree with all TCGA Data Use Certifications as well as the TCGA publication guidelines.
  2. Projects with Controlled Data are labeled as Controlled Data Projects on the CGC.

See also: Controlled Data Project, dbGaP, Open Data, TCGA

Controlled Data Project

Controlled Data Projects on the CGC host both TCGA Open and Controlled Data as well as your private data.

Tip: Controlled Data Projects are labeled TCGA with a red tag and a lock symbol so you can recognize them easily. These projects can only be shared with collaborators who have Controlled Data access through the Database of Genotypes and Phenotypes (dbGaP).

See also: Controlled Data, dbGaP, Open Data, project, TCGA

CCLE

The Cancer Cell Line Encyclopedia (CCLE) is a public project which contains large Open Access Files from the CCLE [which you can use on the CGC ](doc:ccle).

D

dbGaP

The database of Genotypes and Phenotypes (dbGaP) is a repository to archive and distribute data investigating the relationship between phenotype and genotype. The dbGaP grants two levels of access: Open and Controlled.

Tip: To access Controlled Data on the CGC, users must first obtain an approved Data Access Request through dbGaP

See also: Controlled Data, Open Data, TCGA

Docker

[Docker](http://www.docker.com) is a toolkit for creating and running software containers. Docker packages applications and their dependencies into discrete runtime environments "containers". This allows applications to run in diverse infrastructures. They're based on read-only environments called images.

Tip: Docker images are stored on the Platform in the Seven Bridges Image Registry.

See also: sdk, Seven Bridges Image Registry, tool, tool editor

E

eRA Commons

The eRA Commons is an online system where PIs and researchers can interact and discuss information concerning research grants. eRA Commons issues usernames and passwords that can be used to log in to the CGC.

I

Input

Inputs are data that is supplied to a tool.

Tip: Typically, inputs are files. However, if you are installing your own tool on the CGC using Rabix, then any parameters or settings that are supplied to the tool are treated as inputs and are accepted by their own input ports, which you should specify in the tool editor.

See also: input port, Rabix, tool, tool editor

Input and Output Ports

A port is a "gateway" for data to flow in and out of your tool. You should add one input port per datatype your tool takes as input (such as files, strings, or integers) and per method of processing. For instance, if your tool takes a file and an integer as input per execution, you should enter two ports; if it takes two files, one reference and one data file, and does different things to each, then you should also enter two ports.

Tip:

  1. On the Workflow editor, a tool's ports are represented by circles on the node that represents the tool.
  2. Typically, input ports take data files. However, if you are installing your own tool on the CGC using Rabix, the parameters and settings you enter for the tool are treated as inputs and are accepted by their own input ports.

See also: input, node, Rabix, tool, workflow editor

M

Main Dashboard

The Main Dashboard is the landing page when you log into the CGC. On the top navigation bar, you can access **Projects**, **Data**, and **Apps**. The **Projects** panel gives a preview of of recent **Projects**. The **Public Data and Apps** panel has a link to public files and apps via the **Case Explorer**, **Data Browser**, **Tool Editor**, and **Workflow Editor**. The **Getting Started** panel contains a checklist to help you familiarize yourself with the platform. The **Tasks** panel displays a preview of recent tasks.

See also: Case Explorer, Data Browser, projectpublic appstask, tool, tool editorworkflow

Manifest

When uploading files to the CGC via the command line uploader, a manifest file can be used to [upload multiple files and set the accompanying metadata](doc:set-metadata-using-the-command-line-uploader/).

N

NIH CIT

Node

Nodes are the graphical representation of a workflow's building blocks in the Workflow Editor. They represent inputs, outputs, and the tools used to build the workflow.

Tip: Hover over a node in the Workflow Editor to display the descriptions for all possible connections that node can make via input and output ports.

See also: input, input and output ports, tool, workflow, workflow editor

O

Open Data

Open Data on the CGC contains TCGA data which is not unique to a donor, such as de-identified clinical and demographic data, gene expression data, copy number alterations in regions of the genome, epigenetic data, or summaries of data compiled across individuals.

Tip: Any researcher can access and use Open Data on the CGC as long as they agree to the data use restrictions and requirements outlined in the TCGA publication guidelines.

See also: Controlled Data, Open Data, TCGA

Open Data Project

Open Data Projects on the CGC host both TCGA Open Data and your private data.

Tip: TCGA Controlled Data cannot be copied inside an Open Data Project.

See also: Controlled Data, project, Open Data, TCGA

P

Project

Projects are the core building blocks of the CGC. Each project corresponds to a distinct scientific investigation, serving as a container for its data, analysis tools, results, and team of collaborators. However, multiple workflow executions can be carried out within a project.

Tip: The CGC contains two types of projects: Open Data Projects and Controlled Data Projects.

See also: Controlled Data Project, Open Data Project, project memberworkflow

Project Dashboard

The Project Dashboard is the landing page for a project. Each project has its own dashboard which displays information about the data and workflows within the project. The left navigation panel is an editable project description which you and your collaborators can use to write notes about your project. The project dashboard also includes a Project Member panel and a Task panel, which displays a list of recent tasks. Tabs on the top right for **Files**, **Apps**, **Interactive Analysis**, **Tasks**, and **Settings** give a more detailed view of each feature.

See also: app, projectproject member, task

Project Member

Project members are your collaborators on a given project. If you are a project administrator, you can add new collaborators as well as adjust the permissions level of each collaborator. Permissions include: read, write, copy, execute, and admin privileges.

Tip: You can view your collaborators on the Project Members tab of the Project Dashboard.

See also: project, project dashboard

Public Apps

Public apps are workflows and tools maintained by the CGC bioinformatics team to represent the latest versions. There are about 150 publicly available tools and workflows.

See also: app, tool, tool editor, workflow, workflow editor

Public Reference Files

Public Reference Files is a repository on the Seven Bridges Platform which contains files that can be used as references, annotation files, and sample data in your analysis. Many bioinformatics tools and workflows require reference and annotation files to work properly. These files are maintained by the Seven Bridges bioinformatics team. These files can be copied to **Project Files** for use in analyses.

Tip: Access this repository by clicking on Data on the navigation bar. Select Public Reference Files from the drop-down menu.

See also: file, project files

R

Rabix

Rabix is a toolkit developed by Seven Bridges for installing tools on the CGC. It has two components. The first is the Rabix Command Line Interface (CLI), which allows you to install a tool in a Docker container and upload the image to the CGC image registry. The second is the tool editor, which allows you to describe the tool's interface so that it can be run on the CGC.

See also: tool, tool editor

S

Smart Connectors

Smart connectors are graphical representations of the connections between tools in a workflow, through which data flows. They are shown as pipes on the Workflow Editor.

Tip:

  1. Select input/output ports on a node to automatically highlight all possible connections. When you click an output port of a tool, the Workflow Editor will indicate all the input ports of other tools that are compatible with it, by highlighting them in green.
    • For example If your tool outputs FASTQ files, all input ports of other tools on the canvas that accept FASTQs as input will be highlighted.
  2. To undo a connection, hover over the smart connector and scissors will appear, denoting this connection can be severed.

See also: input and output ports, node, tool, workflow

SPARQL

SPARQL is a query language which can be used to [filter TCGA data](doc:access-files-filtered-using-sparql) based on specified criteria (e.g. type of disease, the last medical follow-up, whether a patient is still alive, etc) and then download TCGA files that satisfy the query.

T

Task

A task is a single execution of a tool or a workflow in a project.

Tip: To access your tasks, first log into your CGC account. Then, click on the Projects tab. Select the appropriate project to access its Project Dashboard. On the top right corner, you will see a row of tabs. Click on the Tasks tab. Here you can find more information on each executed task as well as each draft task.

See also: projectproject dashboard, tool, workflow

TCGA

The Cancer Genome Atlas (TCGA) is a genomics dataset compiled to understand the molecular basis of cancer as a joint effort from the National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), the National Institutes of Health (NIH), and the U.S. Department of Health and Human Services. TCGA contains data on 33 different tumor types and more than 11,000 qualified cases. The CGC currently hosts TCGA datasets.

Tip: TCGA data is grouped into Open Data and Controlled Data. This classification allows the research community a wide range of access to data while insuring the privacy of individuals.

See also: Controlled Data, Open Data, TCGA metadata, TCGA Data

Task logs

Task logs are produced for every job in the task. They include .log files, std.err files, and std.out files. These are useful when troubleshooting failed tasks.

Tip: To access your task logs, first log into your account. Then, click on the Projects tab. Select the appropriate project to access its Project Dashboard. On the top right corner, you will see a row of tabs. Click on the Tasks tab. Here you can find more information on each executed task as well as each draft task.

See also: job, projectproject dashboard, task, tool, workflow

Task Stats

[Task statistics](doc:view-task-stats) are available for every task on the CGC, which can be useful when optimizing a task or debugging one that has failed.

See also: job, projectproject dashboard, task, tool, workflow

Tool

Tools are programs for processing data on the CGC. Tools can be run alone or as part of a workflow. They are graphically represented in the visual interface as nodes.

Tip: Many tools are maintained by the CGC bioinformatics team on the CGC as public apps. If you choose to use your own tools, you can import them using Rabix.

See also: node, public apps, Rabix, tool editor, workflow

Tool Editor

The Tool Editor is a graphical workspace which allows you to describe a tool's interface, including details of the tool's input and output ports, its command options, and the CPU and memory resources it requires.

Tip:

  1. Once you have filled in a tool's details using the Tool Editor, you can download the JSON file that is the tool's Common Workflow Language (CWL) description.
    • Click on Settings in the Tool Editor then select Tool JSON to see the CWL description.
  2. The Tool Editor outputs a Common Workflow Language (CWL) description of the tool.

See also: input and output ports, tool, workflow

W

Workflow

Workflows are chains of interconnected tools. They are fully modifiable using the workflow editor and can be run as is or added to a project.

Tip: Many workflows are maintained by the CGC bioinformatics team on the CGC as public apps.

See also: projectpublic apps, tool, workflow editor

Workflow Editor

The Workflow Editor is a graphical workspace in which you can build a workflow from scratch or modify an existing workflow. You can change the ways tools are connected within a workflow and modify the way they run by setting their parameters.

See also: toolworkflow