Warning: Table './drupal/watchdog' is marked as crashed and last (automatic?) repair failed query: INSERT INTO watchdog (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (0, 'php', '%message in %file on line %line.', 'a:4:{s:6:\"%error\";s:7:\"warning\";s:8:\"%message\";s:40:\"Creating default object from empty value\";s:5:\"%file\";s:57:\"/data/www/html/drupal/modules/taxonomy/taxonomy.pages.inc\";s:5:\"%line\";i:33;}', 3, '', 'https://logd.tw.rpi.edu/taxonomy/term/2', '', '10.0.1.254', 1548027348) in /data/www/html/drupal/includes/database.mysqli.inc on line 134

SPARQL

Retrieving SPARQL Results

Level: 
LOGD 101
Contributor: 
Contributor: 
Description: 
This tutorial describes how to access SPARQL Endpoint, and how to retrieve and format query results using SparqlProxy in Javascrip, PHP and Python.
Prerequisites: 
Prerequisites: 

What to Expect

By the end of this tutorial you should be able to retrieve SPARQL query results in your applications, how to use TWC's SparqlProxy to format the SPARQL results, and how to use your favorite programming language (e.g. PHP, Python, Javascript) to retrieve SPARQL query results.

What You Need to Know

This tutorial assumes you are familiar with concepts found in the following resources:
  • Resource Description Framework (RDF) is a standard model for data interchange on the Web. See [1]
  • SPARQL Protocol and RDF Query Language (SPARQL) is an RDF query language. See [2]
  • RESTful Web Service is a simple web service protocol. see [3]

Talking to a Standard SPARQL Endpoint

There are many ways to get results from a SPARQL endpoint. Most endpoints provide a web form in which you can enter a query and get back the results in HTML or some other format. In this tutorial we'll focus on the SPARQL endpoint hosted by Data.gov, found at http://services.data.gov/sparql .
tutorial-datagov-sparql-query.png
In the above screen shot, we're presented with a form and a drop-down box of formats from which we can select to have our results returned to us. The query text area contains the following SPARQL query that is used to to discover the different government datasets that have been loaded in the triple store, along with the number of triples that are in that dataset.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
SELECT ?g ?number_of_triples
WHERE {
 GRAPH ?g {
   ?s a <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#Dataset> .
   ?s <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#number_of_triples> ?number_of_triples.
 }
}
ORDER BY ?g
The SPARQL query is also available at this URL: http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql Retrieving SPARQL results in HTML: You can see the results in HTML. Here is the link to the query results Retrieving SPARQL results in XML: As we can see in the drop down menu, we can get the results in a number of different formats, some for human readability, and other that are more machine consumable. When you run the query, you can see in your address bar your SPARQL query being used as a parameter. We can use this as a RESTful Web service Interface to the SPARQL endpoint as well. Here's an example of the same query above returning the result in "XML". Here is the link to the query results
http://services.data.gov/sparql?default-graph-uri=&query=+PREFIX+rdf%3A+<http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23>+%0D%0A+SELECT+%3Fg+%3Fnumber_of_triples%0D%0A+WHERE+{%0D%0A++GRAPH+%3Fg+{%0D%0A++++%3Fs+a+<http%3A%2F%2Fdata-gov.tw.rpi.edu%2F2009%2Fdata-gov-twc.rdf%23Dataset>+.%0D%0A++++%3Fs+<http%3A%2F%2Fdata-gov.tw.rpi.edu%2F2009%2Fdata-gov-twc.rdf%23number_of_triples>+%3Fnumber_of_triples.%0D%0A++}%0D%0A+}%0D%0A+ORDER+BY+%3Fg&format=application%2Fxml&debug=on&timeout=

Using SparqlProxy to Format SPARQL Query Results

TWC's SparqlProxy can be used to query a SPARQL endpoint (i.e. Web Service interface of a triple store), and perform some post-process on formatting the SPARQL query results. The results can be converted to many different formats such as CSV, Google Visualization JSON and Simile Exhibit JSON. This makes it easier to develop mashups and visualizations from SPARQL results. Let's try our example from last time, using SparqlProxy at http://logd.tw.rpi.edu/ws/sparqlproxy.php
tutorial-sparqlproxy-sparql-query.png
Retrieving SPARQL results in HTML: In the screenshot above, the interface here looks very similar as last time. If we want to query the SPARQL endpoint at data.gov using SparqlProxy, we can set the SPARQL End Point URL option to http://services.data.gov/sparql . Having this blank will query the LOGD SPARQL endpoint instead. Using the query from last time will return to us the same results as last time. Here is a link to the query result. Retrieving SPARQL results in Google Visualization Compatiable JOSN: You can see SparqlProxy gives us more options in output of results than the original SPARQL endpoint. By selecting "GoogleViz/JSON" option. You will see the results encoded in Google Visualization Compatible JSON. Here is the link to the query result. SparqlProxy Parameters Also like last time, we can access this service as a RESTful web service. The frequently used parameters are listed in the following table. More details about the parameters can be found at http://logd.tw.rpi.edu/technology/sparqlproxy.
Parameter Status Description
service-uri stable URI of SPARQL service, e.g. http://dbpedia.org/sparql, http://logd.tw.rpi.edu/sparql
query stable SPARQL query string
query-uri stable URI of SPARQL query. Use of "query-uri" and "query" are mutually exclusive, i.e., only use one.
output stable (optional) the output format. values: xml,sparql,exhibit,gvds,csv,html. Default is xml

Accessing SPARQL Query Results using SparqlProxy

Now we provide some example code fragment written in different programming languages to help you get started.

JavaScript

google.load("visualization", "1", {packages:["geomap","table","columnchart"]});
google.setOnLoadCallback(google_callback);
function google_callback(){ 
 var sparqlproxy = "http://logd.tw.rpi.edu/ws/sparqlproxy.php";
 var queryloc = "http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql";    
 var service = "http://services.data.gov/sparql";
 var queryurl = sparqlproxy 
                + "?" + "output=gvds"
                + "&service-uri=" + encodeURIComponent(service)
                + "&query-uri=" + encodeURIComponent(queryloc) ;
 var query = new google.visualization.Query(queryurl); // Send the query.
 query.send(handleQueryResponse);
}

function  handleQueryResponse(reponse){
  // Check for query response errors.
 if (response.isError()) {
   alert('Error in query: ' + response.getMessage() + ' ' + response.getDetailedMessage());
   return;
  }
  var data = response.getDataTable();
  ...
}

PHP

 // compose query 
 $sparqlproxy_uri = "http://logd.tw.rpi.edu/ws/sparqlproxy.php"
 $params = array();
 $params["query-uri"] = "http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql";
 $params["service-uri"] = "http://services.data.gov/sparql";
 $params["output"] =  "gvds";
 $query= $sparqlproxy_uri."?". http_build_query($params,,'&') ; //specific for Drupal
 
 //show query result
 echo file_get_contents($query);

Python

 import urllib2 
 sparqlproxy = "http://logd.tw.rpi.edu/ws/sparqlproxy.php"
 queryloc = "http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql" 
 service = "http://services.data.gov/sparql"
 queryurl = sparqlproxy+"?output=gvds&service-uri="+service+"&query-uri="+queryloc

 response = urllib2.urlopen(queryurl) 
 json = response.read()

White House Visitor Search

Description: 
This demo lets users search visitors and visitees of the White House. The top 100 frequent visited people in the White House are listed.
Contributor:
Contributor:
White House Visitor Search

White House Visitor Search

Uses Technology: 
Uses Technology: 
Uses Technology: 
Uses Technology: 
Uses Technology: 
Uses Technology: 
Thumbnail: 

Exploring LOGD Metadata with SPARQL Queries

Level: 
TWC LOGD Portal
Contributor: 
Description: 
This tutorial shows how to use SPARQL queries to explore LOGD metadata to answer questions on the TWC LOGD Portal
The following queries can be used at http://logd.tw.rpi.edu/sparql to find and describe datasets. How up to date are the dataset descriptions?

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix logd:       <http://logd.tw.rpi.edu/vocab/>
 
SELECT *
WHERE {    
  graph logd:Dataset {      
    logd:Dataset dcterms:modified ?modified .
  }  
}
How do the datasets fit into the void:subset hierarchy?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:          <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?dataset ?subdataset ?size ?dump 
WHERE { 
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
                 ?dataset a void:Dataset . 
      optional { ?dataset void:subset ?subdataset } 
      optional { ?subdataset conversion:num_triples ?size } 
      optional { ?subdataset void:dataDump          ?dump } 
  } 
} ORDER BY ?dataset ?subdataset 
What (unversioned) datasets are at the roots of the void:subset hierarchies?

prefix foaf:       <http://xmlns.com/foaf/0.1/>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?source_id ?source_homepage ?dataset_id ?dataset_homepage ?dataset max(?modified) AS ?lastModified
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:VersionedDataset ] ;
             conversion:dataset_identifier ?dataset_id;
             dcterms:modified              ?modified ;
             dcterms:source                ?organization .
    ?organization a foaf:Agent;
                  dcterms:identifier ?source_id .
  }
  graph ?meta {
    ?meta a conversion:MetaDataset .
    optional{ ?organization foaf:homepage ?source_homepage  }
    #exceeds execution time threshold: optional{ ?dataset      foaf:homepage ?dataset_homepage }
  }  
} ORDER BY ?dataset
How many verbatim conversions are there?

prefix foaf:       <http://xmlns.com/foaf/0.1/>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(?dataset)
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:LayerDataset; 
             conversion:conversion_identifier "raw" .
  }
}
How many first-level enhancements are there?

prefix foaf:       <http://xmlns.com/foaf/0.1/>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(?dataset)
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:LayerDataset; 
             conversion:conversion_identifier "enhancement/1" .
  }
}
What datasets are part of the LOD cloud?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset ?dump  
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:SameAsDataset; 
                           void:dataDump ?dump ] 
  }  
}   
All datasets and their dump files

PREFIX foaf:       <http://xmlns.com/foaf/0.1/>
PREFIX dcterms:    <http://purl.org/dc/terms/>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset ?dump_file
WHERE {
  
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
        ?dataset
             a conversion:Dataset;
             void:subset ?version .
        ?version a conversion:VersionedDataset .
 
   optional {
    ?version void:subset  ?layer .
    {
     {
      ?layer 
             void:dataDump ?dump_file ;
             dcterms:modified ?modifiedtime .
     }
     UNION
     {
      ?layer  
              void:dataDump ?dump_file ;
              dcterms:modified ?modifiedtime .
     }
    }
   }
  }
}
ORDER BY DESC(?modifiedtime)
What dataset samples are there, and which are loaded in the triple store?

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?source_id ?dataset_id ?version_id ?layer_id ?sample_uri ?dump_file ?created_date ?loaded_boolean
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset
       a conversion:Dataset;
       conversion:source_identifier  ?source_id;
       conversion:dataset_identifier ?dataset_id;
 
       void:subset [ a conversion:VersionedDataset;
                     conversion:version_identifier ?version_id;
 
                     void:subset [ a conversion:LayerDataset;
                                   conversion:conversion_identifier ?layer_id;
                                   dcterms:created                  ?created_date;
                                   void:subset ?sample_uri ]
                   ] .
    ?sample_uri a conversion:DatasetSample;
                void:dataDump ?dump_file .
  }
  optional {
    graph ?sample_uri {
       ?sample_uri a ?loaded_boolean .
       filter(?loaded_boolean = void:Dataset)
    }
  }
} ORDER BY ?source_id ?dataset_id ?version_id ?layer_id
What datasets are from "data-gov"?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset a conversion:Dataset;
    conversion:source_identifier "data-gov" .
  }
}
What Datasets are at the root of the void:subset hierarchy?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT *
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:VersionedDataset ] .
  }
}
What VoID data subsets are within data-gov's dataset 1008?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?dataset ?subdataset ?size ?dump  
WHERE {   
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {    
    ?dataset a void:Dataset ; 
             conversion:source_identifier "data-gov"; 
             conversion:dataset_identifier "1008" .
    optional { ?dataset    void:subset            ?subdataset }    
    optional { ?subdataset conversion:num_triples ?size }    
    optional { ?subdataset void:dataDump          ?dump }  
  }  
} ORDER BY ?dataset ?subdataset 
Is data.gov's dataset 8 loaded in the sparql endpoint?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
ASK
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
               ?dataset a void:Dataset;
                        conversion:source_identifier  "data-gov";
                        conversion:dataset_identifier "8" .
    optional { ?dataset void:subset ?subdataset }
 
    optional { ?NOPARENT void:subset ?dataset }
    filter(!bound(?NOPARENT))
  }
  graph ?dataset {
     [] a []
  }
} 
Is the raw sample loaded?

prefix ov:         <http://open.vocab.org/terms/>
 
ask
WHERE {
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/1st-anniversary/conversion/raw/subset/sample> {
     [] ov:csvRow ?row
  }
}
Is the first enhancement sample loaded?

prefix ov:         <http://open.vocab.org/terms/>
 
ask
WHERE {
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/1st-anniversary/conversion/e1/subset/sample> {
     [] ov:csvRow ?row
  }
}
What predicates do the datasets use?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset ?predicate
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset conversion:uses_predicate ?predicate
  }
}
Datasets with wgs:lat

prefix wgs:        <http://www.w3.org/2003/01/geo/wgs84_pos#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?g 
WHERE { 
  graph ?g { 
    ?s wgs:lat ?lat 
  } 
}
3-level vs. 4-level void:subset hierarchy (cf. single vs. multiple CSVs)

Datasets comprising only one CSV create a 3-level hierarchy, while datasets comprising more than one CSV create a 4-level hierarchy. Query for all unversioned datasets
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?unversioned
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    {
      # Unversioned datasets WITH single CSV
      ?unversioned void:subset            ?versioned .
      ?versioned   void:subset            ?layer     .
      ?layer       conversion:num_triples ?triples ;
                   void:dataDump          ?dump      .
    }
    UNION
    {
      # Unversioned datasets WITH multiple CSVs
      ?unversioned     void:subset            ?versioned       .
      ?versioned       void:subset            ?layer           .
      ?layer           void:dataDump          ?dump ;
                       void:subset            ?multi_component .
      ?multi_component conversion:num_triples ?triples         .
    }
  }
} ORDER BY ?unversioned 
Same as above:

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(DISTINCT ?unversioned)
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    { # Unversioned datasets WITH single CSV
      ?unversioned void:subset [ 
                        void:subset [ 
                             conversion:num_triples ?triples ;
                             void:dataDump          ?dump     
                        ]
                   ]
    }
    UNION
    { # Unversioned datasets WITH a multiple CSVs
      ?unversioned void:subset [
                        void:subset [ 
                             void:dataDump ?dump ;
                             void:subset [
                                  conversion:num_triples ?triples 
                             ]
                        ]
                   ]
    }
  }
} 
(see http://data-gov.tw.rpi.edu/wiki/URI_design_for_RDF_conversion_of_CSV-based_data#VoID_descriptions for a diagram illustrating the different VoID hierarchies between single- and multi-CSV datasets.) A 3-level example with explicit names

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
SELECT ?p ?o
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21> .
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21/conversion/raw> .
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21/conversion/raw> ?p ?o .
  }
}
A 4-level example with explicit names

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
SELECT ?p ?o
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary/conversion/raw> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary/conversion/raw>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/FM_FACILITY_FILE/version/1st-anniversary/conversion/raw> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/FM_FACILITY_FILE/version/1st-anniversary/conversion/raw> ?p ?o
  }
}   
All dump files and their triple counts

PREFIX void: <http://rdfs.org/ns/void#> 
PREFIX conversion: <http://purl.org/twc/vocab/conversion/> 
 
SELECT ?dataDump sum(?num_triples) AS ?triples 
WHERE { 
    graph <http://logd.tw.rpi.edu/vocab/Dataset> { 
    ?dataset void:subset [ a conversion:VersionedDataset; void:subset ?layer ] . 
    { ?layer conversion:num_triples ?num_triples; void:dataDump ?dataDump. } 
    UNION 
    { ?layer void:dataDump ?dataDump; void:subset ?multiple_table .
    ?multiple_table conversion:num_triples ?num_triples . }
    } 
}
All dump files and their triple counts of an (unversioned) Dataset

PREFIX void:       <http://rdfs.org/ns/void#>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT ?dataDump sum(?num_triples) AS ?triples
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008> 
      void:subset [
        a conversion:VersionedDataset;
        void:subset ?layer ] .
 
    {
      ?layer conversion:num_triples ?num_triples;
             void:dataDump          ?dataDump.
    }
    UNION
    {
      ?layer void:dataDump ?dataDump;
             void:subset   ?multiple_table .
 
      ?multiple_table conversion:num_triples ?num_triples .
    }
  }
}
Getting a dump file of a sample subset of a dataset

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
SELECT DISTINCT ?dataset ?versionedDataset ?layerDataset ?sample ?dump
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset          void:subset ?versionedDataset .
    ?versionedDataset a conversion:VersionedDataset;
                      void:subset ?layerDataset .
    ?layerDataset     a conversion:LayerDataset;
                      void:subset ?sample .
    ?sample           a conversion:DatasetSample;
                      void:dataDump ?dump .
  }
} ORDER BY ?dataset ?versionedDataset ?layerDataset ?sample
Getting a dump file of a sample subset of a dataset (#2)

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
SELECT DISTINCT ?source_id ?dataset_id ?sample ?dump
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?sample a conversion:DatasetSample;
            conversion:source_identifier   ?source_id;
            conversion:dataset_identifier  ?dataset_id;
            conversion:version_identifier "1st-anniversary";
            void:dataDump ?dump .
  }
} ORDER BY ?sample
Attributes on Datasets with void:dataDumps

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT *
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    {?dataset a conversion:LayerDataset; void:dataDump ?dump }
   optional { ?dataset conversion:source_identifier ?source_id }
   optional { ?dataset conversion:dataset_identifier ?dataset_id }
   optional { ?dataset conversion:dataset_version ?version_id }
  }  
}
Counts of datasets with different sets of attributes

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(DISTINCT ?dataset1) AS ?dumps 
           count(DISTINCT ?dataset2) AS ?to_source 
           count(DISTINCT ?dataset3) AS ?to_dataset 
           count(DISTINCT ?dataset4) AS ?to_version
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    {?dataset1 void:dataDump ?dumpfile}
    UNION
 
    {?dataset2 void:dataDump ?dumpfile; conversion:source_identifier ?source_id}
    UNION
           
     {?dataset3 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id}
   UNION
 
    {?dataset4 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id; conversion:dataset_version ?version_id}
  }  
}
Datasets (intentionally) without a version

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset3
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
     ?dataset3 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id .
    optional{?dataset3 conversion:dataset_version ?version_id}
    filter(!bound(?version_id))
  }  
} ORDER BY ?dataset3
What types are instances of void:Dataset?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:          <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?type
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
     ?dataset a void:Dataset ; a ?type .
  }
} ORDER BY ?type
How many triples?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:           <http://rdfs.org/ns/void#>
    
SELECT ?dataset ?size  
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset conversion:num_triples ?size    
  }
} ORDER BY ?dataset
What prefixes does a dataset use?

prefix void:       <http://rdfs.org/ns/void#>
prefix vann:       <http://purl.org/vocab/vann/>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?prefix ?namespace
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008>
       a conversion:Dataset;
       conversion:source_identifier  ?source_id;
       conversion:dataset_identifier ?dataset_id;
 
       void:subset [ a conversion:VersionedDataset; 
                     conversion:version_identifier ?version_id;
 
                     void:subset [ a conversion:LayerDataset;
                                   conversion:conversion_identifier ?layer_id;
                                   void:subset ?sample_uri ] 
                   ] .
    ?sample_uri a conversion:DatasetSample;
                void:dataDump ?dump_file .
  }
  graph ?sample_uri {
    [] vann:preferredNamespacePrefix ?prefix;
       vann:preferredNamespaceUri    ?namespace .
  }
} ORDER BY ?prefix ?namespace
void:exampleResources in a Versioned Dataset

prefix rdfs:       <http://www.w3.org/2000/01/rdf-schema#>
prefix void:       <http://rdfs.org/ns/void#>
prefix ov:         <http://open.vocab.org/terms/>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?eg ?col ?p ?pLabel ?o
WHERE { 
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/2010-Aug-30> { 
    [] void:exampleResource ?eg .
    ?eg ?p ?o .
    optional{ ?p ov:csvCol                       ?col }
    optional{ ?p rdfs:label                      ?pLabel }
    optional{ ?p conversion:subjectDiscriminator ?discrim }
  } 
} ORDER BY ?eg ?col
How many outlinks to Geonames [GovTrack, DBPedia]?

prefix owl: <http://www.w3.org/2002/07/owl#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?s ?o
WHERE {
  graph <http://purl.org/twc/vocab/conversion/SameAsDataset> {
    ?s owl:sameAs ?o
  }
  filter(regex(str(?o),"^http://sws.geonames.org*"))
}
GovTrack?

prefix owl: <http://www.w3.org/2002/07/owl#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?s ?o
WHERE {
  graph <http://purl.org/twc/vocab/conversion/SameAsDataset> {
    ?s owl:sameAs ?o
  }
  filter(regex(str(?o),"^http://www.rdfabout.com/rdf/usgov*"))
}
DBPedia?

prefix owl: <http://www.w3.org/2002/07/owl#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?s ?o
WHERE {
  graph <http://purl.org/twc/vocab/conversion/SameAsDataset> {
    ?s owl:sameAs ?o
  }
  filter(regex(str(?o),"^http://dbpedia.org/resource*"))
}
What enhanced predicates correspond to their raw counterparts?

This query satisfies a use case from our August data.gov mashathon in DC, where EPA converted a CSV without any header information and didn't have time to add them in. The conversion created predicates named after their first value (e.g., raw:p_20090301 instead of e1:utc_date). Reviewing the query a month later, it was VERY difficult to follow -- just as difficult as when we were doing it! After naming the predicates in the enhancement 1 conversion parameters, we were ready to replace the old query. Finding the correspondences would have been difficult and error prone if we didn't have the results from the following query. This is possible because csv2rdf4lod asserts the conversion:enhances descriptions during all enhanced conversions. Because of this, the GRAPH name can be changed to any dataset for which you want to trace predicate creation lineage.
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?enhanced_predicate ?raw_predicate
WHERE {
  GRAPH <http://logd.tw.rpi.edu/source/epa-gov/dataset/air-quality-system/version/mashathon> {
    ?dataset conversion:uses_predicate ?enhanced_predicate .
    ?enhanced_predicate conversion:enhances ?raw_predicate .
  }
}
Datasets enhanced with xsd:dates < h2>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT *
WHERE {
  GRAPH <http://purl.org/twc/vocab/conversion/ConversionProcess> {
    [] 
       conversion:source_identifier  ?source_id;
       conversion:dataset_identifier ?dataset_id;
       conversion:dataset_version    ?version_id;
       conversion:conversion_process [
         conversion:enhancement_identifier ?enhancement_id ;
         conversion:enhance [
           conversion:date_pattern ?pattern
         ]
      ]
    .
  }
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset
       a conversion:LayerDataset;
       conversion:source_identifier  ?source_id;
       conversion:dataset_identifier ?dataset_id;
       conversion:version_identifier ?version_id
  }
} 

How to find datasets using the LOGD sparql endpoint

Level: 
TWC LOGD Portal
Contributor: 
Contributor: 
Description: 
SPARQL queries to describe datasets in the LOGD triple store.
After checking out the live LOGD Data Description and Statistics, we encourage you to start exploring TWC's Linking Open Government Data using the LOGD SPARQL endpoint. For more specific types of queries, see Exploring LOGD Metadata with SPARQL Queries. Note that the list on github is being maintained instead of this page. Understanding the void:subset hierarchy

The LOGD SPARQL endpoint has three special named graphs: - http://logd.tw.rpi.edu/vocab/Dataset contains information about the LOGD datasets that was asserted during conversion to RDF. This includes the VoID subset hierarchy and dataDumps, SCOVO triple counts, references to (and definitions of) the predicates and classes used, and some PML justifications tracing the provenance of the tabular conversions to RDF. - http://purl.org/twc/vocab/conversion/MetaDataset contains information about datasets obtained from other sources. For example, it includes data.gov's Dataset 92 because it describes the rest of data.gov's offerings. A second dataset is TWC's own data catalog that describes similar aspects for datasets from other sources. - http://purl.org/twc/vocab/conversion/SameAsDataset contains owl:sameAs links among entities within the LOGD datasets as well as into DBPedia, Geonames, and GovTrack. All of the links are co-located in a single graph to help explore the interconnectivity of the LOGD datasets. In addition to these special named graphs, there are many named graphs that fall into four categories. These categories are listed in order of size and correspond to their level within the void:subset hierarchy: - (unversioned) Dataset named graphs contain all of the data triples and all of the metadata for an (unversioned) Dataset. An (unversioned) Dataset incorporates all Versioned Datasets that have been created for it. An example instance of an (unversioned) Dataset is http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records. The (unversioned) Dataset name graph is populated with zero or more of its (unversioned) Datasets as needed. We accept requests to populate the (unversioned) Dataset named graphs in the LOGD triple store. - Versioned Dataset named graphs contain all of the data triples and all of the metadata for a Versioned Dataset. A Versioned Dataset incorporates all data triples and metadata from the layers (e.g. "raw", "e1") that have been created for it. Versioned Datasets exist for each (unversioned) Dataset. Two example instances of a Versioned Dataset are http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510 and http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0810, corresponding to the May and August releases of the White House Visitor Access Records. The LOGD triple store is populated with Versioned Datasets as needed. Requests to do so are accepted. - Layer Dataset named graphs contain all data triples and all of the metadata for a Layer Dataset. The two most popular Layer Datasets are the "raw" and "e1" layers, while additional enhancements would provide layers "e2", "e3", etc. The term layer is used to reflect the parallel predicates that layer additional descriptions on top of the same entities within the dataset -- each layer provides a new set of predicates that enables backward compatibility and incremental adoption. Layer Datasets exist for each Version of a Dataset. Three example instances of a Layered Dataset include http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/raw, http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/enhancement/1, and http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0810/conversion/raw. The LOGD triple store is populated with Layer Datasets as needed. Requests to do so are accepted. - Dataset Sample named graphs are the smallest type of named graph. They contain a subset of the data triples and all of the metadata for a Layer Dataset. This subset is intended to provide quick access for overview and/or survey analysis applications. Sample Datasets exist for each Layer of each Version of a Dataset. Three example instances of Dataset Sample include http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/raw/subset/sample, http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/enhancement/1/subset/sample, and http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0810/conversion/raw/subset/sample. The LOGD triple store is populated with all available Dataset Samples. ---- The following queries can be used at http://logd.tw.rpi.edu/sparql to find and describe datasets. How up to date are the dataset descriptions?

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix logd:       <http://logd.tw.rpi.edu/vocab/>
 
SELECT *
WHERE {    
  graph logd:Dataset {      
    logd:Dataset dcterms:modified ?modified .
  }  
}
How do the datasets fit into the void:subset hierarchy?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:          <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?dataset ?subdataset ?size ?dump 
WHERE { 
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
                 ?dataset a void:Dataset . 
      optional { ?dataset void:subset ?subdataset } 
      optional { ?subdataset conversion:num_triples ?size } 
      optional { ?subdataset void:dataDump          ?dump } 
  } 
} ORDER BY ?dataset ?subdataset 
What (unversioned) datasets are at the roots of the void:subset hierarchies?

prefix foaf:       <http://xmlns.com/foaf/0.1/>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?source_id ?source_homepage ?dataset_id ?dataset_homepage ?dataset max(?modified) AS ?lastModified
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:VersionedDataset ] ;
             conversion:dataset_identifier ?dataset_id;
             dcterms:modified              ?modified ;
             dcterms:source                ?organization .
    ?organization a foaf:Agent;
                  dcterms:identifier ?source_id .
  }
  graph ?meta {
    ?meta a conversion:MetaDataset .
    optional{ ?organization foaf:homepage ?source_homepage  }
    #exceeds execution time threshold: optional{ ?dataset      foaf:homepage ?dataset_homepage }
  }  
} ORDER BY ?dataset
How many verbatim conversions are there?

prefix foaf:       <http://xmlns.com/foaf/0.1/>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(?dataset)
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:LayerDataset; 
             conversion:conversion_identifier "raw" .
  }
}
What datasets are part of the LOD cloud?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset ?dump  
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:SameAsDataset; 
                           void:dataDump ?dump ] 
  }  
}   
What dataset samples are there, and which are loaded in the triple store?

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?source_id ?dataset_id ?version_id ?layer_id ?sample_uri ?dump_file ?created_date ?loaded_boolean
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset
       a conversion:Dataset;
       conversion:source_identifier  ?source_id;
       conversion:dataset_identifier ?dataset_id;
 
       void:subset [ a conversion:VersionedDataset;
                     conversion:version_identifier ?version_id;
 
                     void:subset [ a conversion:LayerDataset;
                                   conversion:conversion_identifier ?layer_id;
                                   dcterms:created                  ?created_date;
                                   void:subset ?sample_uri ]
                   ] .
    ?sample_uri a conversion:DatasetSample;
                void:dataDump ?dump_file .
  }
  optional {
    graph ?sample_uri {
       ?sample_uri a ?loaded_boolean .
       filter(?loaded_boolean = void:Dataset)
    }
  }
} ORDER BY ?source_id ?dataset_id ?version_id ?layer_id
What datasets are from "data-gov"?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset a conversion:Dataset;
    conversion:source_identifier "data-gov" .
  }
}
What Datasets are at the root of the void:subset hierarchy?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT *
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:VersionedDataset ] .
  }
}
What VoID data subsets are within data-gov's dataset 1008?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?dataset ?subdataset ?size ?dump  
WHERE {   
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {    
    ?dataset a void:Dataset ; 
             conversion:source_identifier "data-gov"; 
             conversion:dataset_identifier "1008" .
    optional { ?dataset    void:subset            ?subdataset }    
    optional { ?subdataset conversion:num_triples ?size }    
    optional { ?subdataset void:dataDump          ?dump }  
  }  
} ORDER BY ?dataset ?subdataset 
Is data.gov's dataset 8 loaded in the sparql endpoint?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
ASK
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
               ?dataset a void:Dataset;
                        conversion:source_identifier  "data-gov";
                        conversion:dataset_identifier "8" .
    optional { ?dataset void:subset ?subdataset }
 
    optional { ?NOPARENT void:subset ?dataset }
    filter(!bound(?NOPARENT))
  }
  graph ?dataset {
     [] a []
  }
} 
Is the raw sample loaded?

prefix ov:         <http://open.vocab.org/terms/>
 
ask
WHERE {
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/1st-anniversary/conversion/raw/subset/sample> {
     [] ov:csvRow ?row
  }
}
Is the first enhancement sample loaded?

prefix ov:         <http://open.vocab.org/terms/>
 
ask
WHERE {
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/1st-anniversary/conversion/e1/subset/sample> {
     [] ov:csvRow ?row
  }
}
What predicates do the datasets use?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset ?predicate
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset conversion:uses_predicate ?predicate
  }
}
Datasets with wgs:lat

prefix wgs:        <http://www.w3.org/2003/01/geo/wgs84_pos#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?g 
WHERE { 
  graph ?g { 
    ?s wgs:lat ?lat 
  } 
}
3-level vs. 4-level void:subset hierarchy (cf. single vs. multiple CSVs)

Datasets comprising only one CSV create a 3-level hierarchy, while datasets comprising more than one CSV create a 4-level hierarchy. Query for all unversioned datasets
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?unversioned
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    {
      # Unversioned datasets WITH single CSV
      ?unversioned void:subset            ?versioned .
      ?versioned   void:subset            ?layer     .
      ?layer       conversion:num_triples ?triples ;
                   void:dataDump          ?dump      .
    }
    UNION
    {
      # Unversioned datasets WITH multiple CSVs
      ?unversioned     void:subset            ?versioned       .
      ?versioned       void:subset            ?layer           .
      ?layer           void:dataDump          ?dump ;
                       void:subset            ?multi_component .
      ?multi_component conversion:num_triples ?triples         .
    }
  }
} ORDER BY ?unversioned 
Same as above: < h3>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(DISTINCT ?unversioned)
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    { # Unversioned datasets WITH single CSV
      ?unversioned void:subset [ 
                        void:subset [ 
                             conversion:num_triples ?triples ;
                             void:dataDump          ?dump     
                        ]
                   ]
    }
    UNION
    { # Unversioned datasets WITH a multiple CSVs
      ?unversioned void:subset [
                        void:subset [ 
                             void:dataDump ?dump ;
                             void:subset [
                                  conversion:num_triples ?triples 
                             ]
                        ]
                   ]
    }
  }
} 
(see http://data-gov.tw.rpi.edu/wiki/URI_design_for_RDF_conversion_of_CSV-based_data#VoID_descriptions for a diagram illustrating the different VoID hierarchies between single- and multi-CSV datasets.) A 3-level example with explicit names

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
SELECT ?p ?o
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21> .
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21/conversion/raw> .
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21/conversion/raw> ?p ?o .
  }
}
A 4-level example with explicit names

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
SELECT ?p ?o
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary/conversion/raw> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary/conversion/raw>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/FM_FACILITY_FILE/version/1st-anniversary/conversion/raw> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/FM_FACILITY_FILE/version/1st-anniversary/conversion/raw> ?p ?o
  }
}   
All dump files and their triple counts of an (unversioned) Dataset

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT ?dataDump sum(?num_triples) AS ?triples
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008> 
      void:subset [
        a conversion:VersionedDataset;
        void:subset ?layer ] .
 
    {
      ?layer conversion:num_triples ?num_triples;
             void:dataDump          ?dataDump.
    }
    UNION
    {
      ?layer void:dataDump ?dataDump;
             void:subset   ?multiple_table .
 
      ?multiple_table conversion:num_triples ?num_triples .
    }
  }
}
Getting a dump file of a sample subset of a dataset

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
SELECT DISTINCT ?dataset ?versionedDataset ?layerDataset ?sample ?dump
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset          void:subset ?versionedDataset .
    ?versionedDataset a conversion:VersionedDataset;
                      void:subset ?layerDataset .
    ?layerDataset     a conversion:LayerDataset;
                      void:subset ?sample .
    ?sample           a conversion:DatasetSample;
                      void:dataDump ?dump .
  }
} ORDER BY ?dataset ?versionedDataset ?layerDataset ?sample
Getting a dump file of a sample subset of a dataset (#2)

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
SELECT DISTINCT ?source_id ?dataset_id ?sample ?dump
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?sample a conversion:DatasetSample;
            conversion:source_identifier   ?source_id;
            conversion:dataset_identifier  ?dataset_id;
            conversion:version_identifier "1st-anniversary";
            void:dataDump ?dump .
  }
} ORDER BY ?sample
Attributes on Datasets with void:dataDumps

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT *
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    {?dataset a conversion:LayerDataset; void:dataDump ?dump }
   optional { ?dataset conversion:source_identifier ?source_id }
   optional { ?dataset conversion:dataset_identifier ?dataset_id }
   optional { ?dataset conversion:dataset_version ?version_id }
  }  
}
Counts of datasets with different sets of attributes

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(DISTINCT ?dataset1) AS ?dumps 
           count(DISTINCT ?dataset2) AS ?to_source 
           count(DISTINCT ?dataset3) AS ?to_dataset 
           count(DISTINCT ?dataset4) AS ?to_version
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    {?dataset1 void:dataDump ?dumpfile}
    UNION
 
    {?dataset2 void:dataDump ?dumpfile; conversion:source_identifier ?source_id}
    UNION
           
     {?dataset3 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id}
   UNION
 
    {?dataset4 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id; conversion:dataset_version ?version_id}
  }  
}
Datasets (intentionally) without a version

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset3
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
     ?dataset3 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id .
    optional{?dataset3 conversion:dataset_version ?version_id}
    filter(!bound(?version_id))
  }  
} ORDER BY ?dataset3
What types are instances of void:Dataset? < h2>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:          <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?type
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
     ?dataset a void:Dataset ; a ?type .
  }
} ORDER BY ?type

Tech: A Crash Course in SPARQL

Contributor: 
Description: 
A SPARQL tutorial for beginners.

Prerequisites

Familiarity with RDF

Introduction

Disclaimer: This is a very simple and incomplete crash course on SPARQL, there is a lot more to learn, but this is how I see is the fastest path to understand its basic principles.
SPARQL is a query language for the Semantic Web. It was designed to be similar to SQL, a query laguage for relational databases, so it is relatively easy for people to learn it. An example of a query is

Installing and Managing Virtuoso SPARQL Endpoint

Level: 
LOGD Related Technologies
Contributor: 
Description: 
Instructions for installing, configuring, and managing Virtuoso SPARQL endpoints (community edition)

Overview

This tutorial documents instructions for installing, configuring and managing Virtuoso Open Source Edition (VOSE) on 64-bit Linux servers. Some contents of this tutorial are from the VOSE documentation - TODO add link, but tailored specifically to the machine setup of TWC@RPI. It also introduces shell scripts to fulfill common administrative tasks (e.g., probing status of an VOSE SPARQL endpoint, starting/stopping/restarting an VOSE endpoint), which are developed by the LOGD team at RPI.

Installation

Packages and source code can be downloaded at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSDownload. For the purpose of this tutorial, we are using the archived package available at http://sf.net/projects/virtuoso/files. Please note that checking out source code from Virtuoso's CVS server is also possible, please refer to http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSDownload for more detailed information.
After downloading the archived package (virtuoso-opensource-6.1.1.tar.gz), unzip it to the server you want to have Virtuoso installed. A detailed guide to compile and install Virtuoso is available online at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSMake. The following is a step-by-step walk-through.
  • Make sure you have all the Package Dependencies.
  • Set the compiler flags according to the hardware processor and OS of your machine.
  • Unpack the downloaded VOSE package and navigate to that folder.
  • In the command prompt, enter
    ./autogen.sh
    • This will check the presence and right version of the required packages.
  • Enter
    ./configure
    • By default, the target installation directories are under /usr/local, but you can specify your desired directory using:
      ./configure --prefix=/path/to/dir
  • Enter
    ./configure
  • Enter
    make
  • Enter
    make install
    • If using the default target directory /usr/local, you should have root privilege.
    • You can also specify desired target directory using
      make install prefix=/path/to/dir
      . Installing to a directory that the current user have write access doesn't require root privilege.
If no error happens during any of the above steps, the installation should be finished.

Administrative Tasks

Manual Start-up

The Virtuoso server instance can be started by calling
/opt/virtuoso6/bin/virtuoso-t -f &
under the directory where the virtuoso.ini is located. Default directory to find virtuoso.ini is
/opt/virtuoso6/var/lib/virtuoso/db
.

Manual Shutdown

The Virtuoso server instance can be shutdown using the following steps:
  • Log into the isql interactive SQL command line environment. Please substitute <password> accordingly. Initial password set by Virtuoso is 'dba'.
    /opt/virtuoso6/bin/isql 1111 dba <password>
  • Execute the shutdown() function.
    SQL> shutdown();
Alternatively, the following shell command can also shut down a running Virtuoso instance:
/opt/virtuoso6/bin/isql 1111 dba <password> -K

Start-up/Shutdown Scripts

We have come up with some command line scripts on 64bit Linux (CentOS 5) to start-up/shutdown/restart the Virtuoso server instance and SPARQL endpoint in a single command.
  • To check status of the Virtuoso instance:
    sudo /etc/init.d/virtuosod status
  • To start the Virtuoso instance:
    sudo /etc/init.d/virtuosod start
  • To stop the Virtuoso instance:
    sudo /etc/init.d/virtuosod stop
  • To restart the Virtuoso instance:
    sudo /etc/init.d/virtuosod restart
Please note that:
  • All commands require sudo privileged user accounts.
  • Once the Virtuoso server instance is started successfully, the SPARQL endpoint will immediately become accessible at
    http://<host>:<port>/sparql
  • In order to start the Virtuoso instance correctly, please use the 'ps' command to make sure there are no existing live Virtuoso instances running under the directory of /opt/virtuoso6/var/lib/virtuoso/db. Otherwise, the startup command will fail due to the file locking mechanisms used by the Virtuoso implementation.

Loading Triples

We have come up with some command line utility scripts for loading triples in different formats into a named graph in the Virtuoso triple store. The scripts are located at google code and are installed on LOGD at
/opt/virtuoso/scripts
Newer, forked, versions of the scripts are available at github. The formats supported are:
  • RDF/XML
  • Turtle
  • N-triples
  • N-quad
Please follow these steps to load a data file (in either of the formats above) into a named graph:
  • Change directory to where the scripts are located.
    cd /opt/virtuoso/scripts
  • run the script vload, with exactly three arguments:
    • format: [rdf | ttl | nt | nq] corresponds to RDF/XML, Turtle, N-triples, and N-quad respectively.
    • data_file: path to the raw data file.
    • graph_uri: named graph uri into which the triples should be loaded
sudo ./vload nt /path/to/data/file/data-1554.nt http://data-gov.tw.rpi.edu/vocab/Dataset_1554
  • wait until the loading finishes. Depending on the size of the loaded dataset, this might take several seconds to several hours.

Deleting Named Graphs

There is a utility command for deleting a specific named graph from the triple store. It is located at
/opt/virtuoso/scripts
It takes only one argument, the URI of the named graph to be deleted. So, to delete all the triples in the named graph <http://data-gov.tw.rpi.edu/vocab/Dataset_1554>, you can use the following command.
sudo ./vdelete http://data-gov.tw.rpi.edu/vocab/Dataset_1554

Performance Tuning

There are online documentations on how to tune VOSE for better performance, such as the one at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning and http://plato.cs.rpi.edu:8890/doc/html/rdfperformancetuning.html. Generally, configuring some of the parameters in the virtuoso.ini file to proper values helps to improve performance both in terms of loading big datasets and query evaluation. The following is a list of parameters in the virtuoso.ini file that needs to look at:
  • ServerThreads
    • Max number of threads used in the server, should be set close to the number of concurrent connections if heavy usage is expected. A value of 100 should work on most systems.
  • O_DIRECT
    • This may be useful if a large fraction of RAM is configured as database buffers. If this is on, the file system cache will not grow at the expense of the database process, for example it is less likely to swap out memory that Virtuoso uses for its own database buffers.
  • NumberOfBuffers
    • This controls the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Please also note that each buffer takes about 8700 Bytes (please cf. http://docs.openlinksw.com/virtuoso/dbadm.html for details about the size of each buffer).
  • CompileProceduresOnStartup
    • Setting this to 0 will speed up virtuoso startup, because stored procedures will not be loaded until the first time they are called.
  • FDsPerFile
    • Number of file descriptors per file to be obtained from OS. This parameter only effects databases that use striping. Having multiple FDs per file means that as many concurrent I/O operations may simultaneously be pending per file. This allows more flexibility for the OS to schedule the operations, potentially improving file I/O throughput.
  • ResultSetMaxRows
    • This setting is used to limit the number of the rows in the result. Sometimes adjusting the value of this parameter helps to prevent D.O.S attack.
Currently, our experiences is that on a 64bit Linux machine with 8 CPU cores (2*Quad core processor) and 32GB memory, setting the NumberOfBuffers parameter to the value of (32959832*0.6/8 = 2,400,000) will increase the performance significantly.

See also

http://tw.rpi.edu/web/inside/endpoints/installing-virtuoso

Using TWC LOGD SparqlProxy

Level: 
TWC LOGD Portal
Contributor: 
Contributor: 
Description: 
how to use SparqlProxy to connect RDF files and SPARQL endpoints to visualization tools. Example usage of SparqlProxy is discussed in Web browser and programming languages including javascript and PHP.
Overview
  • What is SparqlProxy?  SparqlProxy is a web service that wraps up SPARQL endpoint by rewriting SPARQL query results to formats (e.g. JSON) that is easy to be consumed Web applications such as Google Visualization API, MIT SIMILE Exhibit.
  • Where is SparqlProxy?
Basic RESTful Parameters
  • query: [required] encoded String of SPARQL query
  • query-uri :[required] URI of SPARQL query (use as an alternative to "query" parameter. These two parameters are mutul-exclusive)
  • service-uri: [required] URI of SPARQL Endpoint
  • output: output format.  ''xml'' - SPARQL/XML (default) : ''exhibit'' - JSON for MIT Exhibit : ''gvds'' - JSON for Google Visualization : ''csv'' - CSV : ''html'' - HTML table
Using SparqlProxy via Web InterfaceThere are two alternative options to specify a query in SparqlProxy (http://logd.tw.rpi.edu/ws/sparqlproxy.php) :
  • Option1 - Specify SPARQL query using "Query Text" : You may write your SPARQL query in the text area right after "Query Text", and execute it on a SPARQL endpoint. Simply click '''query''' button in Option1 area, you will see a sparql query issued to a DBpedia sparql endpoint.
  • Option2 - Specify SPARQL query using "Query URI": the only difference between this option and Option 1 is that you need to save your SPARQL query as a web page, and provide the URI of the query in the value field right after "Query URI".
  Using SparqlProxy in Javascript
Example 1 [query triple store and generate google visualization data]. The following code composes a URI representing a query via SparqlProxy's RESTful service: send a SPARQL query to a specific triple store, and then render results in "gvds" format - Google Visualization Table encoded in JSON.
 
function google_callback(){ 
  var sparqlproxy = "http://logd.tw.rpi.edu/ws/sparqlproxy.php";
  var queryloc = "http://logd.tw.rpi.edu/query/logd-stat-void-source.sparql";    
  var service = "http://logd.tw.rpi.edu/sparql";
  var queryurl = sparqlproxy 
 
                 + "?" + "output=gvds"
                 + "&amp;service-uri=" + encodeURIComponent(service)
                 + "&amp;query-uri=" + encodeURIComponent(queryloc) ;
  var query = new google.visualization.Query(queryurl); // Send the query.
  query.send(handleQueryResponse);
 }
 function  handleQueryResponse(reponse){
   // Check for query response errors.
  if (response.isError()) {
    alert('Error in query: ' + response.getMessage() + ' ' + response.getDetailedMessage());
    return;
   }
   var data = response.getDataTable();
   ...
}
Using SparqlProxy in PHP (tested in Drupal's "PHP code" input format)
Example 2 [query triple store and generate html fragment]. The following code composes a URI representing a query via SparqlProxy's RESTful service: send a SPARQL query to a specific triple store, and then render results in "tablecol" format - return a fragment of HTML including a table where each SPARQL result corresponds to a column.
 
// compose query 
  $sparqlproxy_uri = "http://logd.tw.rpi.edu/ws/sparqlproxy.php"
  $params = array();
  $params["query-uri"] = "http://logd.tw.rpi.edu/query/logd-stat-void-source.sparql";
  $params["service-uri"] = "http://logd.tw.rpi.edu/sparql";
  $params["output"] =  "tablecol";
  $query= $sparqlproxy_uri."?". http_build_query($params,'','&') ; //specific for Drupal
  
  //show query result
  echo file_get_contents($query);
Syndicate content