Access data from the Datasets API

ACCESS DATA FROM DATASETS > Access data from the Datasets API

The Datasets API can be linked to the CGC API. This means that file IDs returned by queries in Datasets API can be used in the CGC API to operate on the corresponding files.

Access data from a Datasets API query via the following steps.

Step 1: Get the file id of each file you wish to access via the Datasets API

Issue a query to the Datasets API, such as the one shown below. Learn more about querying datasets via the Datasets API.

POST /datasets/tcga/v0/query HTTP/1.1
Host: cgc-datasets-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74
{
    "entity":"files",
    "hasCase":{
        "hasDiseaseType":"Breast Invasive Carcinoma",
        "hasSample":{
            "hasSampleType":"Primary Tumor"
          }
        },
    "hasDataFormat":"BAM"
}

This returns the following response body:

{
  "count": 100,
  "_embedded": {
    "files": [
      {
        "hasIndex": "567890abc9b0307bc0414164",
        "_links": {
          "self": {
            "href": "https://cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/files/567890abc9b0307bc0414164"
          }
        },
        "label": "4c40538e4124d6847dda7318200960d7.bam",
        "id": "567890abc9b0307bc0414164"
      },
      {
        "hasIndex": "567890abc1e5339df0414123",
        "_links": {
          "self": {
            "href": "https://cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/files/567890abc1e5339df0414123"
          }
        },
        "label": "TCGA-GM-A5PV-01A-11R-A28I-13_mirna.bam",
        "id": "567890abc1e5339df0414123"
      },
       
      <snip>
       
    ]
  },
}

Copy the values for id and hasIndex for each file you wish to copy. We will use these in Step 2 to copy the files.

The hasIndex key contains the IDs of associated index files (*.BAI) which are required for analysis alongside of BAM files on the CGC. If hasIndex displays as null, there are no associated index files.

Step 2: Make a CGC API request to copy a file

Use the values for id and hasIndex obtained from the Datasets API query above to make the CGC API request to copy a file, as shown below. Note that if hasIndex has a value other than null, you need to copy the file IDs for the index files specified in hasIndex for your analysis to run properly.

POST /v2/action/files/copy HTTP/1.1
Host: cgc-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74
{
  "project": "RFranklin/my-project",
  "file_ids": ["567890abc9b0307bc0414164", "567890abc1e5339df0414123", "567890abc4f3066bc3750174", "567890abc8a5639cc6722063"]
}

This returns the following response:

{
  "567890abc9b0307bc0414164": {
    "status": "OK",
    "new_file_id": "567890abc9b0307bc0414164",
    "new_file_name": "4c40538e4124d6847dda7318200960d7.bam"
  },
  "567890abc1e5339df0414123": {
    "status": "OK",
    "new_file_id": "567890abc1e5339df0414123",
    "new_file_name": "TCGA-GM-A5PV-01A-11R-A28I-13_mirna.bam"
  },
  "567890abc3d8130ea4047731": {
    "status": "OK",
    "new_file_id": "567890abc3d8130ea4047731",
    "new_file_name": "4c40538e4124d6847dda7318200960d7.bam.bai"
  },
  "567890abc8a5136ec6127063": {
    "status": "OK",
    "new_file_id": "567890abc8a5136ec6127063",
    "new_file_name": "TCGA-GM-A5PV-01A-11R-A28I-13_mirna.bam.bai"
  }
}