{"_id":"58459a582c0fc019001635ec","version":{"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","__v":37,"createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"58458b4fba4f1c0f009692bb","project":"55faf11ba62ba1170021a9a7","version":"55faf11ba62ba1170021a9aa","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-12-05T15:44:15.650Z","from_sync":false,"order":6,"slug":"datasets-hub","title":"DATASETS HUB"},"__v":0,"project":"55faf11ba62ba1170021a9a7","parentDoc":null,"user":"5613e4f8fdd08f2b00437620","updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-12-05T16:48:24.844Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":33,"body":"[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"On this page:\",\n  \"body\": \"* [Overview](#section-overview)\\n* [Uniform Resource Identifiers](#section-uniform-resource-identifiers)\\n* [Abbreviating URIs](#section-abbreviating-uris)\\n* [SPARQL syntax](#section-sparql-syntax)\\n * [The SELECT clause](#section-the-select-clause)\\n * [The WHERE clause](#section-the-where-clause)\\n* [Anatomy of a SPARQL query](#section-anatomy-of-a-sparql-query)\"\n}\n[/block]\n##Overview\nThis page provides basic information you need to start using SPARQL queries to investigate CGC datasets. If you'd like to learn more, see the <a href=\"https://www.w3.org/TR/sparql11-overview/\" target=\"blank\">W3C's Overview</a>.\n\nSPARQL is a query language for RDF, which is a data structure for relational resources. Before approaching SPARQL, you should be familiar with the notion of an **RDF triple**.\n\nAn **RDF triple** is a statement with a subject, an object, and a predicate (relation). For example, consider the following RDF triple: `tcga:NewTumorEvent tcga:hasNewTumorEventType tcga:NewNeoplasmEventType`.\n\nIn this RDF triple, the subject is `tcga:NewTumorEvent`, the predicate is (having a) `tcga:NewTumorEventType` and the object is the tumor event type `tcga:NewNeoplamsEventType'` For any RDF triple, the subject is always listed in the first place, the predicate in the second place, and the object in the third place.\n\nRDF can be used to describe resources in graph databases. The example above comes from a graph database used by Seven Bridges to store TCGA metadata. Part of the graph is depicted below. The RDF triple, \n\n<div align=\"center\">`tcga:NewTumorEvent tcga:hasNewTumorEventType tcga:NewNeoplasmEventType` </div>\n\nis represented as a directed edge between the nodes `tcga:NewTumorEvent` and `tcga:NewNeoplamsEventType` within the graph. We'll return to the graph representation later.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/86764d5-Screen_Shot_2016-04-26_at_14.04.48.png\",\n        \"Screen Shot 2016-04-26 at 14.04.48.png\",\n        3268,\n        1638,\n        \"#092f4d\"\n      ]\n    }\n  ]\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##SPARQL semantics\n\nHere's a sample SPARQL query into TCGA metadata. We'll see how to interpret it [below](#section-anatomy-of-a-sparql-query). This example uses TCGA metadata but can be adapted to an alternative dataset that has an RDF representation by substituting the correct dataset prefix in the `PREFIX`, and if necessary, alternate entities and properties in the `WHERE` clause .\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\\nPREFIX tcga: <https://www.sbgenomics.com/ontologies/2014/11/tcga#>\\nSELECT ?inv_label\\nWHERE\\n{\\n ?case a tcga:Case .\\n ?case tcga:hasInvestigation ?investigation .\\n ?investigation rdfs:label ?inv_label\\n}\",\n      \"language\": \"text\",\n      \"name\": \"Sample SPARQL query\"\n    }\n  ]\n}\n[/block]\nA SPARQL query into an RDF graph inquires whether there are certain triples in the graph. It does so using **triple patterns**. These are like RDF triples, in which the subject, object or predicate (or any combination of them) may be replaced by a variable. Variables in SPARQL are strings prefixed by a question mark.\n\nFor example, this is an **RDF triple**:\n<div align=\"center\">`tcga:case/00A2D166-09C9-4283-A195-3F6345C27574 tcga:hasInvestigation investigation/tcga-brca`</div>\nAnd this is an **RDF triple pattern**:\n <div align=\"center\">`?case tcga:hasInvestigation tcga:investigation/tcga-brca`</div>\n\nNote that because triples always list subject, object, and predicate in a fixed order, the position of the variable in the **triple pattern** is meaningful. In particular, in the example above we can see that the values of the `?case` variable are resources that are the *subject* of the `tcga:hasInvestigation` predicate, to the `tcga:investigation/tcga-brca` resource. In other words, they are different cases for which we have TCGA-BRCA investigations. So, you could use this **triple pattern** in a query to find all the possible cases in TCGA_BRCA investigations.\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Uniform Resource Identifiers\n\nWe can think of the ordering of terms in a triple as part of the grammar of SPARQL: it gives us some insight into the meaning of an **RDF triple**, but it doesn't tell us what each term in the triple refers to. The reference is determined by the **namespace** that is used for the ontology.\n\nEach term in an **RDF triple**, such as `tcga:NewTumorEvent`, is a  **Uniform Resource Identifier (URI)**. This is a kind of name that picks out a single resource. The assignment of URIs to resources in an ontology (i.e. to data structured into classes, instances, and properties) is called a **namespace**. Note that URIs look like URLs (Uniform Resource Locators), but they are not URLs; in particular, they don't locate any resources – they just identify resources.\n\nSeven Bridges defined a **namespace** over specific datasets by assigning each term of each dataset's metadata schema its own URI. We did this according to the following convention: \n\n* URIs assigned to classes and property have the following format: https://www.sbgenomics.com/ontologies/2014/11/{dataset}#{class_or_property_id}. For example:   \n   *  https://www.sbgenomics.com/ontologies/2014/11/tcga#Case\n   *  https://www.sbgenomics.com/ontologies/2014/11/tcga#hasFollowUp \n   *  https://www.sbgenomics.com/ontologies/2014/11/tcga#hasDiseaseType\n   *  https://www.sbgenomics.com/ontologies/2016/5/ccle#CCLECellLine\n   *  https://www.sbgenomics.com/ontologies/2016/5/ccle#hasAliquot\n   *  https://www.sbgenomics.com/ontologies/2016/5/ccle#hasDiseaseType\n  \n* URIs assigned to individuals have the following format: https://www.sbgenomics.com/{dataset}/{entity_type}/{entity_id}. For example:\n  *  https://www.sbgenomics.com/tcga/case/3B21B982-DBA2-45F4-AD8D-21DC86FCAAA7\n  *  https://www.sbgenomics.com/tcga/follow-up/828E9149-B556-4073-8F6B-55F1F065F37C\n  *  https://www.sbgenomics.com/tcga/disease_type/luad \n  *  https://www.sbgenomics.com/ccle/ccle_cell_line/CCLE-253J\n  *  https://www.sbgenomics.com/ccle/aliquot/2a664cc4-8a3d-4f3f-b6ba-a7bb7995b26d\n  *  https://www.sbgenomics.com/ccle/disease_type/09 \n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Seven Bridges also use URIs from other namespaces to describe metadata, specifically from the RDF Schema namespace defined by W3C. The RDF schema (RDFS) is a namespace of terms that describe the structure of an ontology. It includes URIs to refer to the concept of a class, an instance, a property, and the ways that these interrelate. For example, all relations are instances of the class `rdfs:property`. This allows us, for example, to find all instances of a specific class, such as all files.\\n \\nFor the full vocabulary of the RDF schema see [https://www.w3.org/TR/rdf-schema/#ch_summary](https://www.w3.org/TR/rdf-schema/#ch_summary).\"\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Abbreviating URIs\n\nSince URIs are lengthy, we often abbreviate them in SPARQL queries. This is done by replacing their initial part with a shortened prefix, and then stating at the start of the query what the prefix is short for, using the `PREFIX` clause. For example: `PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>`.\n\nThis tells us that the prefix `rdfs:` is short for `http://www.w3.org/2000/01/rdf-schema#`. It allows us to write, simply, `rdfs: label` instead of the full URI `http://www.w3c.org/2000/01/rdf-schema#label`. Note that string following the `PREFIX` clause is enclosed in angle brackets `< >`.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"The RDF character `a`\",\n  \"body\": \"In our example queries, we use the special predicate `rdfs:type` from the RDF schema, which describes the relationship of instantiating a class. The predicate `rdfs:type` is abbreviated by the character `a`.\"\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##SPARQL syntax\n\nNow we'll see how **RDF triple patterns** function in SPARQL queries.\n\nSPARQL queries have two main components: a `SELECT` clause and a `WHERE` clause. They may also use a `PREFIX` clause, as our examples do below. The `PREFIX` clause is discussed [above](#section-abbreviating-uris).\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n###The `SELECT` clause\n\nThe `SELECT` clause in a SPARQL query names the variable(s) that you are retrieving values for. Note that the `SELECT` clause does not need to list all the variables that are used in the query. You can use additional variables in the query than those listed in the `SELECT` clause, for instance intermediate variables whose values will be used to obtain the values of further variables, but they will not be stored as the result of the query.\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n###The `WHERE` clause\n\nThe `WHERE` clause in a SPARQL query states the RDF triple pattern or patterns that indicate what kind of values we are looking for.\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Anatomy of a SPARQL query\n[block:callout]\n{\n  \"type\": \"success\",\n  \"body\": \"The example query below is for TCGA but can be used to model queries for any of the available datasets by making the following adjustments:\\n\\n  * Substitute the appropriate [prefix](#section-abbreviating-uris). For instance, use the following prefix for CCLE, `PREFIX ccle: <https://www.sbgenomics.com/ontologies/2014/11/ccle#>`.\\n  * Modify the `WHERE` clause of the query to reflect the metadata ontology for the queried datasets. For instance, `case` is a TCGA entity but is not a CCLE entity. Using `case` in a CCLE query results in an error. Learn more about the [metadata ontology for each dataset](about-metadata-for-datasets) available on the CGC.\"\n}\n[/block]\nWe're now ready to dissect a SPARQL query to see how the clauses are used. Consider the example query for TCGA we saw above:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\\nPREFIX tcga: <https://www.sbgenomics.com/ontologies/2014/11/tcga#>\\n \\nSELECT ?inv_label\\nWHERE\\n{\\n ?case a tcga:Case .\\n ?case tcga:hasInvestigation ?investigation .\\n ?investigation rdfs:label ?inv_label\\n}\",\n      \"language\": \"text\",\n      \"name\": \"Example SPARQL query\"\n    }\n  ]\n}\n[/block]\nThe first `PREFIX` clause states that, in this query, `rdfs:` is short for `http://www.w3.org/2000/01/rdf-schema#`, thus indicating that a resource is named with the the RDF Schema (RDFS) namespace. Similarly, the second `PREFIX` clause states that `tcga:` is short for `http://www.sbgenomics.com/ontologies/2014/11/tcga#`.\n \nThe `SELECT` clause tells us that we are looking for values for the variable `?inv_label` that match the triple patterns set out in the `WHERE` clause.\n\nThe `WHERE` clause lists three RDF **triple patterns**. \n\nThe first triple pattern is:\n<div align=\"center\">`?case a tcga:Case`</div>\n\nThis picks out all the resources that have the `rdfs:type` of `tcga:case`. Recall from the tip box above that `a` is short for the relation `rdfs:type`. \n\nThe second triple pattern is:\n\n<div align=\"center\">`?case tcga:hasInvestigation ?investigation`</div>\n\nThis picks out all resources that satisfy the predicate `hasInvestigation` of for some value of the `?case` variable. This is all the resources that are an investigation for some case in the TCGA ontology.\n\n The third triple pattern is:\n\n<div align=\"center\">`?investigation rdfs:label ?inv_label`</div>\n\nThis picks out all resources that satisfy the label predicate of any value of the `?investigation` variable, whose values were bound by the second triple in the clause. We can see from its full URI that label is a term from the RDF Schema defined by W3C.\n\nThe dot (`.`) at the end of the first two triple patterns indicates conjunction. It tells us that the resource being queried must satisfy all three triple patterns.\n\nPutting all these pieces together, we can see that the three RDF triple patterns constrain the values of `?inv_label` to be those resources that are the labels of any investigation of any case in TCGA, i.e we are finding all the investigations named in TCGA. These are the resources that will be returned by the query.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Try running this query in the [query console](https://opensparql.sbgenomics.com/#/console) to see what resources are returned.\"\n}\n[/block]\nCheck out [example SPARQL queries](doc:example-sparql-queries) to get started. While these example queries are TCGA-specific, the syntax can be modified for other datasets.\n\n<div align=\"right\"><a href=\"#top\">top</a></div>","excerpt":"<a href=\"query-datasets\" style=\"color:#132c56\">QUERY DATASETS</a> > <a href=\"about-sparql\" style=\"color:#132c56\">About SPARQL</a> > SPARQL basics","slug":"sparql-basics","type":"basic","title":"↳ SPARQL basics"}

↳ SPARQL basics

<a href="query-datasets" style="color:#132c56">QUERY DATASETS</a> > <a href="about-sparql" style="color:#132c56">About SPARQL</a> > SPARQL basics

[block:callout] { "type": "warning", "title": "On this page:", "body": "* [Overview](#section-overview)\n* [Uniform Resource Identifiers](#section-uniform-resource-identifiers)\n* [Abbreviating URIs](#section-abbreviating-uris)\n* [SPARQL syntax](#section-sparql-syntax)\n * [The SELECT clause](#section-the-select-clause)\n * [The WHERE clause](#section-the-where-clause)\n* [Anatomy of a SPARQL query](#section-anatomy-of-a-sparql-query)" } [/block] ##Overview This page provides basic information you need to start using SPARQL queries to investigate CGC datasets. If you'd like to learn more, see the <a href="https://www.w3.org/TR/sparql11-overview/" target="blank">W3C's Overview</a>. SPARQL is a query language for RDF, which is a data structure for relational resources. Before approaching SPARQL, you should be familiar with the notion of an **RDF triple**. An **RDF triple** is a statement with a subject, an object, and a predicate (relation). For example, consider the following RDF triple: `tcga:NewTumorEvent tcga:hasNewTumorEventType tcga:NewNeoplasmEventType`. In this RDF triple, the subject is `tcga:NewTumorEvent`, the predicate is (having a) `tcga:NewTumorEventType` and the object is the tumor event type `tcga:NewNeoplamsEventType'` For any RDF triple, the subject is always listed in the first place, the predicate in the second place, and the object in the third place. RDF can be used to describe resources in graph databases. The example above comes from a graph database used by Seven Bridges to store TCGA metadata. Part of the graph is depicted below. The RDF triple, <div align="center">`tcga:NewTumorEvent tcga:hasNewTumorEventType tcga:NewNeoplasmEventType` </div> is represented as a directed edge between the nodes `tcga:NewTumorEvent` and `tcga:NewNeoplamsEventType` within the graph. We'll return to the graph representation later. [block:image] { "images": [ { "image": [ "https://files.readme.io/86764d5-Screen_Shot_2016-04-26_at_14.04.48.png", "Screen Shot 2016-04-26 at 14.04.48.png", 3268, 1638, "#092f4d" ] } ] } [/block] <div align="right"><a href="#top">top</a></div> ##SPARQL semantics Here's a sample SPARQL query into TCGA metadata. We'll see how to interpret it [below](#section-anatomy-of-a-sparql-query). This example uses TCGA metadata but can be adapted to an alternative dataset that has an RDF representation by substituting the correct dataset prefix in the `PREFIX`, and if necessary, alternate entities and properties in the `WHERE` clause . [block:code] { "codes": [ { "code": "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\nPREFIX tcga: <https://www.sbgenomics.com/ontologies/2014/11/tcga#>\nSELECT ?inv_label\nWHERE\n{\n ?case a tcga:Case .\n ?case tcga:hasInvestigation ?investigation .\n ?investigation rdfs:label ?inv_label\n}", "language": "text", "name": "Sample SPARQL query" } ] } [/block] A SPARQL query into an RDF graph inquires whether there are certain triples in the graph. It does so using **triple patterns**. These are like RDF triples, in which the subject, object or predicate (or any combination of them) may be replaced by a variable. Variables in SPARQL are strings prefixed by a question mark. For example, this is an **RDF triple**: <div align="center">`tcga:case/00A2D166-09C9-4283-A195-3F6345C27574 tcga:hasInvestigation investigation/tcga-brca`</div> And this is an **RDF triple pattern**: <div align="center">`?case tcga:hasInvestigation tcga:investigation/tcga-brca`</div> Note that because triples always list subject, object, and predicate in a fixed order, the position of the variable in the **triple pattern** is meaningful. In particular, in the example above we can see that the values of the `?case` variable are resources that are the *subject* of the `tcga:hasInvestigation` predicate, to the `tcga:investigation/tcga-brca` resource. In other words, they are different cases for which we have TCGA-BRCA investigations. So, you could use this **triple pattern** in a query to find all the possible cases in TCGA_BRCA investigations. <div align="right"><a href="#top">top</a></div> ##Uniform Resource Identifiers We can think of the ordering of terms in a triple as part of the grammar of SPARQL: it gives us some insight into the meaning of an **RDF triple**, but it doesn't tell us what each term in the triple refers to. The reference is determined by the **namespace** that is used for the ontology. Each term in an **RDF triple**, such as `tcga:NewTumorEvent`, is a **Uniform Resource Identifier (URI)**. This is a kind of name that picks out a single resource. The assignment of URIs to resources in an ontology (i.e. to data structured into classes, instances, and properties) is called a **namespace**. Note that URIs look like URLs (Uniform Resource Locators), but they are not URLs; in particular, they don't locate any resources – they just identify resources. Seven Bridges defined a **namespace** over specific datasets by assigning each term of each dataset's metadata schema its own URI. We did this according to the following convention: * URIs assigned to classes and property have the following format: https://www.sbgenomics.com/ontologies/2014/11/{dataset}#{class_or_property_id}. For example: * https://www.sbgenomics.com/ontologies/2014/11/tcga#Case * https://www.sbgenomics.com/ontologies/2014/11/tcga#hasFollowUp * https://www.sbgenomics.com/ontologies/2014/11/tcga#hasDiseaseType * https://www.sbgenomics.com/ontologies/2016/5/ccle#CCLECellLine * https://www.sbgenomics.com/ontologies/2016/5/ccle#hasAliquot * https://www.sbgenomics.com/ontologies/2016/5/ccle#hasDiseaseType * URIs assigned to individuals have the following format: https://www.sbgenomics.com/{dataset}/{entity_type}/{entity_id}. For example: * https://www.sbgenomics.com/tcga/case/3B21B982-DBA2-45F4-AD8D-21DC86FCAAA7 * https://www.sbgenomics.com/tcga/follow-up/828E9149-B556-4073-8F6B-55F1F065F37C * https://www.sbgenomics.com/tcga/disease_type/luad * https://www.sbgenomics.com/ccle/ccle_cell_line/CCLE-253J * https://www.sbgenomics.com/ccle/aliquot/2a664cc4-8a3d-4f3f-b6ba-a7bb7995b26d * https://www.sbgenomics.com/ccle/disease_type/09 [block:callout] { "type": "info", "body": "Seven Bridges also use URIs from other namespaces to describe metadata, specifically from the RDF Schema namespace defined by W3C. The RDF schema (RDFS) is a namespace of terms that describe the structure of an ontology. It includes URIs to refer to the concept of a class, an instance, a property, and the ways that these interrelate. For example, all relations are instances of the class `rdfs:property`. This allows us, for example, to find all instances of a specific class, such as all files.\n \nFor the full vocabulary of the RDF schema see [https://www.w3.org/TR/rdf-schema/#ch_summary](https://www.w3.org/TR/rdf-schema/#ch_summary)." } [/block] <div align="right"><a href="#top">top</a></div> ##Abbreviating URIs Since URIs are lengthy, we often abbreviate them in SPARQL queries. This is done by replacing their initial part with a shortened prefix, and then stating at the start of the query what the prefix is short for, using the `PREFIX` clause. For example: `PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>`. This tells us that the prefix `rdfs:` is short for `http://www.w3.org/2000/01/rdf-schema#`. It allows us to write, simply, `rdfs: label` instead of the full URI `http://www.w3c.org/2000/01/rdf-schema#label`. Note that string following the `PREFIX` clause is enclosed in angle brackets `< >`. [block:callout] { "type": "info", "title": "The RDF character `a`", "body": "In our example queries, we use the special predicate `rdfs:type` from the RDF schema, which describes the relationship of instantiating a class. The predicate `rdfs:type` is abbreviated by the character `a`." } [/block] <div align="right"><a href="#top">top</a></div> ##SPARQL syntax Now we'll see how **RDF triple patterns** function in SPARQL queries. SPARQL queries have two main components: a `SELECT` clause and a `WHERE` clause. They may also use a `PREFIX` clause, as our examples do below. The `PREFIX` clause is discussed [above](#section-abbreviating-uris). <div align="right"><a href="#top">top</a></div> ###The `SELECT` clause The `SELECT` clause in a SPARQL query names the variable(s) that you are retrieving values for. Note that the `SELECT` clause does not need to list all the variables that are used in the query. You can use additional variables in the query than those listed in the `SELECT` clause, for instance intermediate variables whose values will be used to obtain the values of further variables, but they will not be stored as the result of the query. <div align="right"><a href="#top">top</a></div> ###The `WHERE` clause The `WHERE` clause in a SPARQL query states the RDF triple pattern or patterns that indicate what kind of values we are looking for. <div align="right"><a href="#top">top</a></div> ##Anatomy of a SPARQL query [block:callout] { "type": "success", "body": "The example query below is for TCGA but can be used to model queries for any of the available datasets by making the following adjustments:\n\n * Substitute the appropriate [prefix](#section-abbreviating-uris). For instance, use the following prefix for CCLE, `PREFIX ccle: <https://www.sbgenomics.com/ontologies/2014/11/ccle#>`.\n * Modify the `WHERE` clause of the query to reflect the metadata ontology for the queried datasets. For instance, `case` is a TCGA entity but is not a CCLE entity. Using `case` in a CCLE query results in an error. Learn more about the [metadata ontology for each dataset](about-metadata-for-datasets) available on the CGC." } [/block] We're now ready to dissect a SPARQL query to see how the clauses are used. Consider the example query for TCGA we saw above: [block:code] { "codes": [ { "code": "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\nPREFIX tcga: <https://www.sbgenomics.com/ontologies/2014/11/tcga#>\n \nSELECT ?inv_label\nWHERE\n{\n ?case a tcga:Case .\n ?case tcga:hasInvestigation ?investigation .\n ?investigation rdfs:label ?inv_label\n}", "language": "text", "name": "Example SPARQL query" } ] } [/block] The first `PREFIX` clause states that, in this query, `rdfs:` is short for `http://www.w3.org/2000/01/rdf-schema#`, thus indicating that a resource is named with the the RDF Schema (RDFS) namespace. Similarly, the second `PREFIX` clause states that `tcga:` is short for `http://www.sbgenomics.com/ontologies/2014/11/tcga#`. The `SELECT` clause tells us that we are looking for values for the variable `?inv_label` that match the triple patterns set out in the `WHERE` clause. The `WHERE` clause lists three RDF **triple patterns**. The first triple pattern is: <div align="center">`?case a tcga:Case`</div> This picks out all the resources that have the `rdfs:type` of `tcga:case`. Recall from the tip box above that `a` is short for the relation `rdfs:type`. The second triple pattern is: <div align="center">`?case tcga:hasInvestigation ?investigation`</div> This picks out all resources that satisfy the predicate `hasInvestigation` of for some value of the `?case` variable. This is all the resources that are an investigation for some case in the TCGA ontology. The third triple pattern is: <div align="center">`?investigation rdfs:label ?inv_label`</div> This picks out all resources that satisfy the label predicate of any value of the `?investigation` variable, whose values were bound by the second triple in the clause. We can see from its full URI that label is a term from the RDF Schema defined by W3C. The dot (`.`) at the end of the first two triple patterns indicates conjunction. It tells us that the resource being queried must satisfy all three triple patterns. Putting all these pieces together, we can see that the three RDF triple patterns constrain the values of `?inv_label` to be those resources that are the labels of any investigation of any case in TCGA, i.e we are finding all the investigations named in TCGA. These are the resources that will be returned by the query. [block:callout] { "type": "info", "body": "Try running this query in the [query console](https://opensparql.sbgenomics.com/#/console) to see what resources are returned." } [/block] Check out [example SPARQL queries](doc:example-sparql-queries) to get started. While these example queries are TCGA-specific, the syntax can be modified for other datasets. <div align="right"><a href="#top">top</a></div>