↳ Query via the Datasets API
QUERY DATASETS > About the Datasets API > Query via the Datasets API
Advance Access
This feature is in our advance access program. This means that, while it is fully operational, it is subject to change.
Overview
Issue queries with the Datasets API to filter or count a dataset's entities:
- Filter entities to return only the resources that match your query criteria.
- Count entities that match your query criteria.
Learn more about each dataset's entities from its [metadata page](about-metadata-for-datasets.
Procedure
Queries are written in JSON and are given in terms of an entity's metadata. There are two steps to filtering or counting dataset entities using queries:
- Issue a
GET
request to get an entity's schema, as shown in the section below. This request lists which metadata fields are available for an entity, by consulting that entity's metadata schema. The metadata schema is a list of the entity's metadata fields and the permissible datatypes of their values. - Construct a query using key-value pairs consisting of metadata fields and values of the appropriate datatype. Send the query as a
POST
request to filter a dataset's entities or as aPOST
request to count a dataset's entities.
Since every response is limited to maximum of 100 displayed results, add the
"offset"
property to the request body to display more results. The"offset"
will return results starting from the integer value you input + 1. For example,"offset": 100
will return the 101st to 200th results.
This page shows the process for both of the above steps. Learn more about different query types from our examples.
Step 1: Get an entity's metadata schema
This step draws upon the Datasets API's browsing functionality to obtain the metadata schema for an entity within a dataset. The example request below is for TCGA but can be applied to all datasets by substituting the TCGA path below with the appropriate dataset path. To obtain a dataset's path, issue a GET
request to /datasets
.
For example, to see the metadata schema for TCGA cases, issue:
GET /datasets/tcga/v0/cases/schema HTTP/1.1
Host: cgc-datasets-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74
The response body contains the metadata schema for the case entity.
{
"hasPriorDiagnosis": {
"values": [
"Both History of Synchronous/ Bilateral and Prior Malignancy",
"Yes, History of Synchronous and or Bilateral Malignancy",
"No",
"Yes, History of Synchronous/Bilateral Malignancy",
"Not available",
"Yes",
"Yes, History of Prior Malignancy"
],
"type": "enum"
},
"hasVitalStatus": {
"values": [
"Lost to follow-up",
"Dead",
"Alive",
"LIVING",
"Not available",
"DECEASED"
],
"type": "enum"
},
"id": {
"type": "string"
},
"hasNewTumorEventAfterInitialTreatment": {
"type": "string"
},
"hasDaysToDeath": {
"type": "integer"
},
"hasClinicalT": {
"values": [
"T2",
"T1c",
"T4d",
"T3a",
"Ta",
"T3",
"T3d",
"T1a2",
"T2a1",
"T1a",
"T2a",
"T2b",
"T3b",
"T0",
"T1a1",
"T1b1",
"Not available",
"T1",
"T4",
"T2a2",
"T4c",
"T2d",
"T3c",
"TX",
"T1b2",
"T2c",
"Tis (DCIS)",
"T4e",
"Tis (Paget's)",
"Tis",
"T1mi",
"Tis (LCIS)",
"T1b",
"T4b",
"T4a"
],
"type": "enum"
}
<snip>
}
Step 2a: Filter entities
Use an entity's metadata fields, obtained above, to write a query specifying values for certain metadata fields that you want your entities to match. The query is sent as JSON in the body of a POST
request, as shown below. Note that the request below is for TCGA but that the same request can be issued for any dataset by substituting the TCGA path below with the appropriate dataset path.
POST /datasets/tcga/v0/query HTTP/1.1
Host: cgc-datasets-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74
In the body of your request, send the query as key-value pairs as follows:
Name | Datatype of value | Description of value |
---|---|---|
entity | string | The metadata entity you wish to query. For instance, you can specify an entity of case . |
fields optional | string | The fields key is an optional field which takes a list. Here, you choose to expose metadata fields related to the entity .For instance, for an entity of case you can supply a fields of a list containing one element, hasVitalStatus . Be sure to format your list using square brackets, [ ] . In this case, the response to your query will display the metadata field hasVitalStatus and return its value or Not Available .Note that metadata fields you enter in fields do not act as a filter. The only way to filter in a query is by entering metadata fields followed by a specific value, as described below |
<metadata_field> | string | The key is a metadata field followed by a specific value. For instance, you can specify hasPrimarySite as a metadata field with a value such as Liver .Recall that the request to obtain an entity's metadata schema provides a list of possible values for metadata fields with a type of enum . |
{
"entity":"cases",
"fields":["hasVitalStatus", "hasPriorDiagnosis"],
"hasPrimarySite": "Liver"
}
The request above returns details of cases whose primary cancer site is the liver. Note that as 377 cases are returned, we have omitted part of the response body.
{
"count": 100,
"_embedded": {
"cases": [
<snip>
{
"hasPriorDiagnosis": "No",
"hasVitalStatus": "Alive",
"id": "11823A8A-12A3-45E6-789F-EF5A3FC7D2B9",
"label": "11823A8A-12A3-45E6-789F-EF5A3FC7D2B9",
"_links": {
"self": {
"href": "cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/cases/11823A8A-12A3-45E6-789F-EF5A3FC7D2B9"
}
}
},
{
"hasPriorDiagnosis": "Yes",
"hasVitalStatus": "Dead",
"id": "17833039-45E6-47F0-789F-8CABAE94C124",
"label": "17833039-45E6-47F0-789F-8CABAE94C124",
"_links": {
"self": {
"href": "cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/cases/17833039-45E6-47F0-789F-8CABAE94C124"
}
}
},
{
"hasPriorDiagnosis": "No",
"hasVitalStatus": "Dead",
"id": "0965D871-12A3-789F-456E-B4FEB3AEEF5D",
"label": "0965D871-12A3-789F-456E-B4FEB3AEEF5D",
"_links": {
"self": {
"href": "cgc-datasets-api.sbgenomics.com/datasets/tcga/v0/cases/0965D871-12A3-789F-456E-B4FEB3AEEF5D"
}
}
}
<snip>
]
}
}
For each TCGA case satisfying the query, the response shows:
- Its
id
, such as0965D871-12A3-789F-456E-B4FEB3AEEF5D
. - Its
label
, such as0965D871-12A3-789F-456E-B4FEB3AEEF5D
. - A
path
, specified under_links
, to which you can issue aGET
request for more information about the case. - The
fields
which were requested along with their values, such as"hasVitalStatus": "Alive"
. If no value is available the following is returned:Not Available
is returned for metadata fields with the type of value of enum.null
is returned for metadata fields with a type of value other than enum.
Similar responses are returned for alternate entities in TCGA and other datasets.
Step 2b: Count entities
Use an entity's metadata fields, obtained above, to write a query counting the number of entities that satisfy your query criteria. This is done in exactly the same way as filtering entities, except that the path is appended with /total
.
To count TCGA resources meeting your query criteria, send the request shown below. Note that the request below is for TCGA but that the same request can be issued for any dataset by substituting the TCGA path below with the appropriate dataset path.
To count TCGA resources meeting your query criteria, send the following request:
POST /datasets/tcga/v0/query/total HTTP/1.1
Host: cgc-datasets-api.sbgenomics.com
X-SBG-Auth-Token: 3210a98c1db9304ea9d9273156740f74
You should include your query as the body of the request. The query is constructed in the same way as for the API request to filter entities.
Next step
The following pages in this section contain examples of queries that you can use to filter or count entities. The following example requests use TCGA data but can be issued for any dataset by substituting the TCGA path with the appropriate dataset path.
- Example query 1: Find samples connected to a case
- Example query 2: Count samples connected to a case
- Example query 3: Find cases with given age at diagnosis
- Example query 4: Find all cases with a given age at diagnosis and a particular disease
- Example query 5: Complex example for filtering TCGA data
- Example query 6: Find TCGA cases with or without a prior diagnosis and related samples from a particular tissue source site and return the sample type code for each of these samples
Resources
Updated over 2 years ago