↳ Query via the Datasets API

QUERY DATASETS > About the Datasets API > Query via the Datasets API

❗️

Advance Access

This feature is in our advance access program. This means that, while it is fully operational, it is subject to change.

Overview

Issue queries with the Datasets API to filter or count a dataset's entities:

  • Filter entities to return only the resources that match your query criteria.
  • Count entities that match your query criteria.

Learn more about each dataset's entities from its [metadata page](about-metadata-for-datasets.

Procedure

Queries are written in JSON and are given in terms of an entity's metadata. There are two steps to filtering or counting dataset entities using queries:

  1. Issue a GET request to get an entity's schema, as shown in the section below. This request lists which metadata fields are available for an entity, by consulting that entity's metadata schema. The metadata schema is a list of the entity's metadata fields and the permissible datatypes of their values.
  2. Construct a query using key-value pairs consisting of metadata fields and values of the appropriate datatype. Send the query as a POST request to filter a dataset's entities or as a POST request to count a dataset's entities.

📘

Since every response is limited to maximum of 100 displayed results, add the "offset" property to the request body to display more results. The "offset" will return results starting from the integer value you input + 1. For example, "offset": 100 will return the 101st to 200th results.

This page shows the process for both of the above steps. Learn more about different query types from our examples.

Step 1: Get an entity's metadata schema

This step draws upon the Datasets API's browsing functionality to obtain the metadata schema for an entity within a dataset. The example request below is for TCGA but can be applied to all datasets by substituting the TCGA path below with the appropriate dataset path. To obtain a dataset's path, issue a GET request to /datasets.

For example, to see the metadata schema for TCGA cases, issue:

GET /datasets/tcga/v0/cases/schema HTTP/1.1
Host: cgc-datasets-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74

The response body contains the metadata schema for the case entity.

{
  "hasPriorDiagnosis": {
    "values": [
      "Both History of Synchronous/ Bilateral and Prior Malignancy",
      "Yes, History of Synchronous and or Bilateral Malignancy",
      "No",
      "Yes, History of Synchronous/Bilateral Malignancy",
      "Not available",
      "Yes",
      "Yes, History of Prior Malignancy"
    ],
    "type": "enum"
  },
  "hasVitalStatus": {
    "values": [
      "Lost to follow-up",
      "Dead",
      "Alive",
      "LIVING",
      "Not available",
      "DECEASED"
    ],
    "type": "enum"
  },
  "id": {
    "type": "string"
  },
  "hasNewTumorEventAfterInitialTreatment": {
    "type": "string"
  },
  "hasDaysToDeath": {
    "type": "integer"
  },
  "hasClinicalT": {
    "values": [
      "T2",
      "T1c",
      "T4d",
      "T3a",
      "Ta",
      "T3",
      "T3d",
      "T1a2",
      "T2a1",
      "T1a",
      "T2a",
      "T2b",
      "T3b",
      "T0",
      "T1a1",
      "T1b1",
      "Not available",
      "T1",
      "T4",
      "T2a2",
      "T4c",
      "T2d",
      "T3c",
      "TX",
      "T1b2",
      "T2c",
      "Tis (DCIS)",
      "T4e",
      "Tis (Paget's)",
      "Tis",
      "T1mi",
      "Tis (LCIS)",
      "T1b",
      "T4b",
      "T4a"
    ],
    "type": "enum"
  }
  <snip>
}

Step 2a: Filter entities

Use an entity's metadata fields, obtained above, to write a query specifying values for certain metadata fields that you want your entities to match. The query is sent as JSON in the body of a POST request, as shown below. Note that the request below is for TCGA but that the same request can be issued for any dataset by substituting the TCGA path below with the appropriate dataset path.

POST /datasets/tcga/v0/query HTTP/1.1
Host: cgc-datasets-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74

In the body of your request, send the query as key-value pairs as follows:

NameDatatype of valueDescription of value
entitystringThe metadata entity you wish to query.

For instance, you can specify an entity of case.
fields
optional
stringThe fields key is an optional field which takes a list. Here, you choose to expose metadata fields related to the entity.

For instance, for an entity of case you can supply a fields of a list containing one element, hasVitalStatus. Be sure to format your list using square brackets, [ ]. In this case, the response to your query will display the metadata field hasVitalStatus and return its value or Not Available.

Note that metadata fields you enter in fields do not act as a filter. The only way to filter in a query is by entering metadata fields followed by a specific value, as described below
<metadata_field>stringThe key is a metadata field followed by a specific value.

For instance, you can specify hasPrimarySite as a metadata field with a value such as Liver.

Recall that the request to obtain an entity's metadata schema provides a list of possible values for metadata fields with a type of enum.
{
   "entity":"cases",
   "fields":["hasVitalStatus", "hasPriorDiagnosis"],
   "hasPrimarySite": "Liver"
}

The request above returns details of cases whose primary cancer site is the liver. Note that as 377 cases are returned, we have omitted part of the response body.

{
  "count": 100,
  "_embedded": {
    "cases": [
    <snip>
      {
        "hasPriorDiagnosis": "No",
        "hasVitalStatus": "Alive",
        "id": "11823A8A-12A3-45E6-789F-EF5A3FC7D2B9",
        "label": "11823A8A-12A3-45E6-789F-EF5A3FC7D2B9",
        "_links": {
          "self": {
            "href": "cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/cases/11823A8A-12A3-45E6-789F-EF5A3FC7D2B9"
          }
        }
      },
      {
        "hasPriorDiagnosis": "Yes",
        "hasVitalStatus": "Dead",
        "id": "17833039-45E6-47F0-789F-8CABAE94C124",
        "label": "17833039-45E6-47F0-789F-8CABAE94C124",
        "_links": {
          "self": {
            "href": "cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/cases/17833039-45E6-47F0-789F-8CABAE94C124"
          }
        }
      },
      {
        "hasPriorDiagnosis": "No",
        "hasVitalStatus": "Dead",
        "id": "0965D871-12A3-789F-456E-B4FEB3AEEF5D",
        "label": "0965D871-12A3-789F-456E-B4FEB3AEEF5D",
        "_links": {
          "self": {
            "href": "cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/cases/0965D871-12A3-789F-456E-B4FEB3AEEF5D"
          }
        }
      }
    <snip>
    ]
  }
}

For each TCGA case satisfying the query, the response shows:

  • Its id, such as 0965D871-12A3-789F-456E-B4FEB3AEEF5D.
  • Its label, such as 0965D871-12A3-789F-456E-B4FEB3AEEF5D.
  • A path, specified under _links, to which you can issue a GET request for more information about the case.
  • The fields which were requested along with their values, such as "hasVitalStatus": "Alive". If no value is available the following is returned:
    • Not Available is returned for metadata fields with the type of value of enum.
    • null is returned for metadata fields with a type of value other than enum.

Similar responses are returned for alternate entities in TCGA and other datasets.

Step 2b: Count entities

Use an entity's metadata fields, obtained above, to write a query counting the number of entities that satisfy your query criteria. This is done in exactly the same way as filtering entities, except that the path is appended with /total.

To count TCGA resources meeting your query criteria, send the request shown below. Note that the request below is for TCGA but that the same request can be issued for any dataset by substituting the TCGA path below with the appropriate dataset path.

To count TCGA resources meeting your query criteria, send the following request:

POST /datasets/tcga/v0/query/total HTTP/1.1
Host: cgc-datasets-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74

You should include your query as the body of the request. The query is constructed in the same way as for the API request to filter entities.

Next step

The following pages in this section contain examples of queries that you can use to filter or count entities. The following example requests use TCGA data but can be issued for any dataset by substituting the TCGA path with the appropriate dataset path.

Resources