Tech: A Crash Course in SPARQL

Contributor: 
Description: 
A SPARQL tutorial for beginners.

Prerequisites

Familiarity with RDF

Introduction

Disclaimer: This is a very simple and incomplete crash course on SPARQL, there is a lot more to learn, but this is how I see is the fastest path to understand its basic principles.
SPARQL is a query language for the Semantic Web. It was designed to be similar to SQL, a query laguage for relational databases, so it is relatively easy for people to learn it. An example of a query is
SELECT ?node ?title
 WHERE{
  ?node <http://purl.org/dc/elements/1.1/title> ?title .
 }
 LIMIT 1
You can go to http://logd.tw.rpi.edu/sparql, copy and paste the previous code into the box and click on "run query". It will show the following results
node title
<http://xmlns.com/foaf/0.1/> "Friend of a Friend (FOAF) vocabulary"
The results are displayed in HTML form (there are other formats we will look at later). In order to understand what all it means, we need to understand the concept of triple.

What is a Triple?

A Triple is the minimal amount of information expressable in Semantic Web. It is composed of 3 elements:
  1. A subject which is a URI (e.g., a "web address") that represents something.
  2. A predicate which is another URI that represents a certain property of the subject.
  3. An object which can be a URI or a literal (a string) that is related to the subject through the predicate.
Thus, an example triple could be :
 <http://graves.cl/foaf.rdf#me> <http://xmlns.com/foaf/0.1/givenname> "Alvaro"
 <http://graves.cl/foaf.rdf#me> <http://xmlns.com/foaf/0.1/schoolHomepage> <http://www.rpi.edu

Understanding basic SPARQL

Back to our first example, we can see now what it does. We request two variables that we call ?node and ?title (variables start with a question mark). In the second to fourth line we create the graph patterns for how these variables should relate to each other. In this case we say that ?node should have a ?title related through the predicate <http://purl.org/dc/elements/1.1/title>, which is a relationship defined to represent that something has a title. Finally the LIMIT 1 line restricts the system to retrieve only 1 result (in case there may be multiple).  
 SELECT ?node ?title
 WHERE{
   ?node <http://purl.org/dc/elements/1.1/title> ?title .
 }
 LIMIT 1
In english, we are asking "Give me some resource (node) that has a title (related through the predicate http://purl.org/dc/elements/1.1/title)".

Prefixes and shortcuts

One of the problems with managing URIs is that they are very long. For example, if we are looking at the names of different people we can ask 
 SELECT ?node ?name
 WHERE{
   ?node <http://xmlns.com/foaf/0.1/givenname> ?name .
   ?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
 }
 LIMIT 10
  In this case we are asking "Give me all the resources that have a name and are of type Person". The results of this query will be
nodename
<http://dbpedia.org/resource/David_L._Boren>"David L."@en
<http://dbpedia.org/resource/James_L._Jones>"James L."@en
<http://dbpedia.org/resource/Joe_Biden>"Joe"@en
<http://dbpedia.org/resource/Lawrence_Summers>"Lawrence"@en
<http://dbpedia.org/resource/Michelle_Obama>"Michelle"@en
<http://dbpedia.org/resource/Nancy-Ann_DeParle>"Nancy-Ann"@en
<http://dbpedia.org/resource/Rahm_Emanuel>"Rahm"@en
<http://dbpedia.org/resource/Valerie_Jarrett>"Valerie B."@en
<http://dbpedia.org/resource/David_L._Boren>"David L."@en
<http://dbpedia.org/resource/James_L._Jones>"James L."@en
It is easy to see that adding more and more restrictions makes these queries really long and hard to manage. More improtantly it is very likely that we could make typos and syntactic mistakes, which would affect the results.
To solve this we can use PREFIX at the beginning of the query, which allows us to specify a namespace for the URIs. For example, the same query from above (using prefixes) would look like
 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 SELECT ?node ?name
 WHERE{
   ?node foaf:givenname ?name .
   ?nore rdf:type foaf:Person .
 }
 LIMIT 10
Now, instead of having to write entire URIs, we use the prefix (in this case foaf or rdf) and append only the last part (local name) of the URI to it.

Shortcuts

As we have seen, sometimes we define a graph we want to retrieve by describing several properties from the same node. A way to simplify this is using a semicolon instead of a point in each triple and omiting the subject. Thus, our previous example would look like
 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 SELECT ?node ?name
 WHERE{
   ?node foaf:givenname ?name ;
            rdf:type foaf:Person .
 }
 LIMIT 10
We changed the first restriction final period to a semicolon and thus we don't need to write the subject node again.
Another common issue is the need to specify a certain type (using rdf:type). Only for this predicate we can changed it to "a" (rdf:type and "a" are interchangable)  
 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 SELECT ?node ?name
 WHERE{
   ?node foaf:givenname ?name ;
            a foaf:Person .
 }
 LIMIT 10
Given that rdf:type was the only time we used the rdf prefix, we can get rid of it as well.

Graphs

Triple stores allow us to use named graphs, which allow people to have multiple graphs in the same database. Each named graph is identified by a URI (which eventually can be used for describing other things as well).For example, using our first example but now limiting the number of results returned to 3
 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT ?node ?title
 WHERE{
   ?node dc:title ?title .
 }
 LIMIT 3
we obtain three identical results as can be seen in the next table.
nodetitle
<http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
Actually what is happening is that there are three identical triples contained in different graphs. We can see this using the following query.
 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT ?graph ?node ?title
 WHERE{
   GRAPH ?graph{
     ?node dc:title ?title .
   }
 }
 LIMIT 3
graphnodetitle
<http://xmlns.com/foaf/0.1/Document><http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/homepage><http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/><http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
In this case we added the variable ?graph to show which graph each of the results is located. Finally, we can include several graphs in a SPARQL query:  
 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT  ?node8 ?desc8 ?node401 ?desc401
 WHERE{
   GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_401>{
     ?node401 dc:description ?desc401 .
   }
   GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_8>{
     ?node8 dc:description ?desc8 .
   }
 }
 LIMIT 3
node8desc8node401desc401
<http://data-gov.tw.rpi.edu/raw/8/data-8-00003.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "<http://data-gov.tw.rpi.edu/raw/401/data-401.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
<http://data-gov.tw.rpi.edu/raw/8/data-8-00003.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "<http://data-gov.tw.rpi.edu/raw/401/index.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
<http://data-gov.tw.rpi.edu/raw/8/data-8-00005.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "<http://data-gov.tw.rpi.edu/raw/401/data-401.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "

Union

It is also possible to obtain the UNION of different graph patterns, for example
 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT  ?node8 ?desc8 ?node401 ?desc401
 WHERE{
   {
     GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_401>{
       ?node401 dc:description ?desc401 .
     }
   }UNION{
     GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_8>{
       ?node8 dc:description ?desc8 .
     }
   }
 }
 LIMIT 3
node8desc8node401desc401
  <http://data-gov.tw.rpi.edu/raw/401/data-401.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
  <http://data-gov.tw.rpi.edu/raw/401/index.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
<http://data-gov.tw.rpi.edu/raw/8/data-8-00003.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "  
Compare it with the result of the previous example

Optional

Sometimes we find that certain patterns are desired but not mandatory. For that case, we can uso OPTIONAL in our query. Thus we may be able to match at least the required patterns and only eventually the ones inside the OPTIONAL brackets  
 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 
 SELECT   ?node ?name ?givenname
 WHERE{
     ?node foaf:name ?name .
     OPTIONAL{
       ?node foaf:givenname ?givenname .
     }
 }
nodenamegivenname
<http://data-gov.tw.rpi.edu/vocab/Department_of_Commerce>"Department of Commerce" 
<http://data-gov.tw.rpi.edu/vocab/Environmental_Protection_Agency>"Environmental Protection Agency 
<http://dbpedia.org/resource/David_L._Boren>"David L. Boren"@en"David L."@en

Filters

A very useful component in SPARQL is the FILTER operator, which allows users to create specific restrictions, based on arithmetic operators, regular expressions, etc.  In the following query, we restrict the results to contain only those triples that have object values that contain the string "Biden" 
 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 
 SELECT   ?node ?name ?givenname
 WHERE{
     ?node foaf:name ?name .
     ?node foaf:givenname ?givenname .
     FILTER regex(?name, "Biden") .
 }
nodenamegivenname
<http://dbpedia.org/resource/Joe_Biden>"Joe Biden"@en"Joe"@en
<http://dbpedia.org/resource/Joe_Biden>"Joseph Biden"@en"Joe"@en

References

For more informatino on SPARQL, please check the following links
Your rating: None Average: 4.1 (8 votes)