How to find datasets using the LOGD sparql endpoint

Level: 
TWC LOGD Portal
Contributor: 
Contributor: 
Description: 
SPARQL queries to describe datasets in the LOGD triple store.
After checking out the live LOGD Data Description and Statistics, we encourage you to start exploring TWC's Linking Open Government Data using the LOGD SPARQL endpoint. For more specific types of queries, see Exploring LOGD Metadata with SPARQL Queries. Note that the list on github is being maintained instead of this page. Understanding the void:subset hierarchy

The LOGD SPARQL endpoint has three special named graphs: - http://logd.tw.rpi.edu/vocab/Dataset contains information about the LOGD datasets that was asserted during conversion to RDF. This includes the VoID subset hierarchy and dataDumps, SCOVO triple counts, references to (and definitions of) the predicates and classes used, and some PML justifications tracing the provenance of the tabular conversions to RDF. - http://purl.org/twc/vocab/conversion/MetaDataset contains information about datasets obtained from other sources. For example, it includes data.gov's Dataset 92 because it describes the rest of data.gov's offerings. A second dataset is TWC's own data catalog that describes similar aspects for datasets from other sources. - http://purl.org/twc/vocab/conversion/SameAsDataset contains owl:sameAs links among entities within the LOGD datasets as well as into DBPedia, Geonames, and GovTrack. All of the links are co-located in a single graph to help explore the interconnectivity of the LOGD datasets. In addition to these special named graphs, there are many named graphs that fall into four categories. These categories are listed in order of size and correspond to their level within the void:subset hierarchy: - (unversioned) Dataset named graphs contain all of the data triples and all of the metadata for an (unversioned) Dataset. An (unversioned) Dataset incorporates all Versioned Datasets that have been created for it. An example instance of an (unversioned) Dataset is http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records. The (unversioned) Dataset name graph is populated with zero or more of its (unversioned) Datasets as needed. We accept requests to populate the (unversioned) Dataset named graphs in the LOGD triple store. - Versioned Dataset named graphs contain all of the data triples and all of the metadata for a Versioned Dataset. A Versioned Dataset incorporates all data triples and metadata from the layers (e.g. "raw", "e1") that have been created for it. Versioned Datasets exist for each (unversioned) Dataset. Two example instances of a Versioned Dataset are http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510 and http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0810, corresponding to the May and August releases of the White House Visitor Access Records. The LOGD triple store is populated with Versioned Datasets as needed. Requests to do so are accepted. - Layer Dataset named graphs contain all data triples and all of the metadata for a Layer Dataset. The two most popular Layer Datasets are the "raw" and "e1" layers, while additional enhancements would provide layers "e2", "e3", etc. The term layer is used to reflect the parallel predicates that layer additional descriptions on top of the same entities within the dataset -- each layer provides a new set of predicates that enables backward compatibility and incremental adoption. Layer Datasets exist for each Version of a Dataset. Three example instances of a Layered Dataset include http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/raw, http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/enhancement/1, and http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0810/conversion/raw. The LOGD triple store is populated with Layer Datasets as needed. Requests to do so are accepted. - Dataset Sample named graphs are the smallest type of named graph. They contain a subset of the data triples and all of the metadata for a Layer Dataset. This subset is intended to provide quick access for overview and/or survey analysis applications. Sample Datasets exist for each Layer of each Version of a Dataset. Three example instances of Dataset Sample include http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/raw/subset/sample, http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0510/conversion/enhancement/1/subset/sample, and http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/0810/conversion/raw/subset/sample. The LOGD triple store is populated with all available Dataset Samples. ---- The following queries can be used at http://logd.tw.rpi.edu/sparql to find and describe datasets. How up to date are the dataset descriptions?

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix logd:       <http://logd.tw.rpi.edu/vocab/>
 
SELECT *
WHERE {    
  graph logd:Dataset {      
    logd:Dataset dcterms:modified ?modified .
  }  
}
How do the datasets fit into the void:subset hierarchy?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:          <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?dataset ?subdataset ?size ?dump 
WHERE { 
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
                 ?dataset a void:Dataset . 
      optional { ?dataset void:subset ?subdataset } 
      optional { ?subdataset conversion:num_triples ?size } 
      optional { ?subdataset void:dataDump          ?dump } 
  } 
} ORDER BY ?dataset ?subdataset 
What (unversioned) datasets are at the roots of the void:subset hierarchies?

prefix foaf:       <http://xmlns.com/foaf/0.1/>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?source_id ?source_homepage ?dataset_id ?dataset_homepage ?dataset max(?modified) AS ?lastModified
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:VersionedDataset ] ;
             conversion:dataset_identifier ?dataset_id;
             dcterms:modified              ?modified ;
             dcterms:source                ?organization .
    ?organization a foaf:Agent;
                  dcterms:identifier ?source_id .
  }
  graph ?meta {
    ?meta a conversion:MetaDataset .
    optional{ ?organization foaf:homepage ?source_homepage  }
    #exceeds execution time threshold: optional{ ?dataset      foaf:homepage ?dataset_homepage }
  }  
} ORDER BY ?dataset
How many verbatim conversions are there?

prefix foaf:       <http://xmlns.com/foaf/0.1/>
prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(?dataset)
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:LayerDataset; 
             conversion:conversion_identifier "raw" .
  }
}
What datasets are part of the LOD cloud?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset ?dump  
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:SameAsDataset; 
                           void:dataDump ?dump ] 
  }  
}   
What dataset samples are there, and which are loaded in the triple store?

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?source_id ?dataset_id ?version_id ?layer_id ?sample_uri ?dump_file ?created_date ?loaded_boolean
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset
       a conversion:Dataset;
       conversion:source_identifier  ?source_id;
       conversion:dataset_identifier ?dataset_id;
 
       void:subset [ a conversion:VersionedDataset;
                     conversion:version_identifier ?version_id;
 
                     void:subset [ a conversion:LayerDataset;
                                   conversion:conversion_identifier ?layer_id;
                                   dcterms:created                  ?created_date;
                                   void:subset ?sample_uri ]
                   ] .
    ?sample_uri a conversion:DatasetSample;
                void:dataDump ?dump_file .
  }
  optional {
    graph ?sample_uri {
       ?sample_uri a ?loaded_boolean .
       filter(?loaded_boolean = void:Dataset)
    }
  }
} ORDER BY ?source_id ?dataset_id ?version_id ?layer_id
What datasets are from "data-gov"?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset a conversion:Dataset;
    conversion:source_identifier "data-gov" .
  }
}
What Datasets are at the root of the void:subset hierarchy?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT *
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset a conversion:Dataset;
             void:subset [ a conversion:VersionedDataset ] .
  }
}
What VoID data subsets are within data-gov's dataset 1008?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?dataset ?subdataset ?size ?dump  
WHERE {   
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {    
    ?dataset a void:Dataset ; 
             conversion:source_identifier "data-gov"; 
             conversion:dataset_identifier "1008" .
    optional { ?dataset    void:subset            ?subdataset }    
    optional { ?subdataset conversion:num_triples ?size }    
    optional { ?subdataset void:dataDump          ?dump }  
  }  
} ORDER BY ?dataset ?subdataset 
Is data.gov's dataset 8 loaded in the sparql endpoint?

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
ASK
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
               ?dataset a void:Dataset;
                        conversion:source_identifier  "data-gov";
                        conversion:dataset_identifier "8" .
    optional { ?dataset void:subset ?subdataset }
 
    optional { ?NOPARENT void:subset ?dataset }
    filter(!bound(?NOPARENT))
  }
  graph ?dataset {
     [] a []
  }
} 
Is the raw sample loaded?

prefix ov:         <http://open.vocab.org/terms/>
 
ask
WHERE {
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/1st-anniversary/conversion/raw/subset/sample> {
     [] ov:csvRow ?row
  }
}
Is the first enhancement sample loaded?

prefix ov:         <http://open.vocab.org/terms/>
 
ask
WHERE {
  graph <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/1st-anniversary/conversion/e1/subset/sample> {
     [] ov:csvRow ?row
  }
}
What predicates do the datasets use?

prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset ?predicate
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset conversion:uses_predicate ?predicate
  }
}
Datasets with wgs:lat

prefix wgs:        <http://www.w3.org/2003/01/geo/wgs84_pos#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?g 
WHERE { 
  graph ?g { 
    ?s wgs:lat ?lat 
  } 
}
3-level vs. 4-level void:subset hierarchy (cf. single vs. multiple CSVs)

Datasets comprising only one CSV create a 3-level hierarchy, while datasets comprising more than one CSV create a 4-level hierarchy. Query for all unversioned datasets
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?unversioned
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    {
      # Unversioned datasets WITH single CSV
      ?unversioned void:subset            ?versioned .
      ?versioned   void:subset            ?layer     .
      ?layer       conversion:num_triples ?triples ;
                   void:dataDump          ?dump      .
    }
    UNION
    {
      # Unversioned datasets WITH multiple CSVs
      ?unversioned     void:subset            ?versioned       .
      ?versioned       void:subset            ?layer           .
      ?layer           void:dataDump          ?dump ;
                       void:subset            ?multi_component .
      ?multi_component conversion:num_triples ?triples         .
    }
  }
} ORDER BY ?unversioned 
Same as above: < h3>
prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(DISTINCT ?unversioned)
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    { # Unversioned datasets WITH single CSV
      ?unversioned void:subset [ 
                        void:subset [ 
                             conversion:num_triples ?triples ;
                             void:dataDump          ?dump     
                        ]
                   ]
    }
    UNION
    { # Unversioned datasets WITH a multiple CSVs
      ?unversioned void:subset [
                        void:subset [ 
                             void:dataDump ?dump ;
                             void:subset [
                                  conversion:num_triples ?triples 
                             ]
                        ]
                   ]
    }
  }
} 
(see http://data-gov.tw.rpi.edu/wiki/URI_design_for_RDF_conversion_of_CSV-based_data#VoID_descriptions for a diagram illustrating the different VoID hierarchies between single- and multi-CSV datasets.) A 3-level example with explicit names

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
SELECT ?p ?o
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21> .
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21/conversion/raw> .
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008/version/2010-Jul-21/conversion/raw> ?p ?o .
  }
}
A 4-level example with explicit names

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
SELECT ?p ?o
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary/conversion/raw> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/version/1st-anniversary/conversion/raw>
        void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/FM_FACILITY_FILE/version/1st-anniversary/conversion/raw> .
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1033/FM_FACILITY_FILE/version/1st-anniversary/conversion/raw> ?p ?o
  }
}   
All dump files and their triple counts of an (unversioned) Dataset

prefix void:       <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT ?dataDump sum(?num_triples) AS ?triples
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/1008> 
      void:subset [
        a conversion:VersionedDataset;
        void:subset ?layer ] .
 
    {
      ?layer conversion:num_triples ?num_triples;
             void:dataDump          ?dataDump.
    }
    UNION
    {
      ?layer void:dataDump ?dataDump;
             void:subset   ?multiple_table .
 
      ?multiple_table conversion:num_triples ?num_triples .
    }
  }
}
Getting a dump file of a sample subset of a dataset

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
SELECT DISTINCT ?dataset ?versionedDataset ?layerDataset ?sample ?dump
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?dataset          void:subset ?versionedDataset .
    ?versionedDataset a conversion:VersionedDataset;
                      void:subset ?layerDataset .
    ?layerDataset     a conversion:LayerDataset;
                      void:subset ?sample .
    ?sample           a conversion:DatasetSample;
                      void:dataDump ?dump .
  }
} ORDER BY ?dataset ?versionedDataset ?layerDataset ?sample
Getting a dump file of a sample subset of a dataset (#2)

prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:       <http://rdfs.org/ns/void#>
SELECT DISTINCT ?source_id ?dataset_id ?sample ?dump
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?sample a conversion:DatasetSample;
            conversion:source_identifier   ?source_id;
            conversion:dataset_identifier  ?dataset_id;
            conversion:version_identifier "1st-anniversary";
            void:dataDump ?dump .
  }
} ORDER BY ?sample
Attributes on Datasets with void:dataDumps

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT *
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    {?dataset a conversion:LayerDataset; void:dataDump ?dump }
   optional { ?dataset conversion:source_identifier ?source_id }
   optional { ?dataset conversion:dataset_identifier ?dataset_id }
   optional { ?dataset conversion:dataset_version ?version_id }
  }  
}
Counts of datasets with different sets of attributes

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT count(DISTINCT ?dataset1) AS ?dumps 
           count(DISTINCT ?dataset2) AS ?to_source 
           count(DISTINCT ?dataset3) AS ?to_dataset 
           count(DISTINCT ?dataset4) AS ?to_version
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    {?dataset1 void:dataDump ?dumpfile}
    UNION
 
    {?dataset2 void:dataDump ?dumpfile; conversion:source_identifier ?source_id}
    UNION
           
     {?dataset3 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id}
   UNION
 
    {?dataset4 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id; conversion:dataset_version ?version_id}
  }  
}
Datasets (intentionally) without a version

prefix dcterms:    <http://purl.org/dc/terms/>
prefix void:          <http://rdfs.org/ns/void#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
 
SELECT DISTINCT ?dataset3
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
     ?dataset3 void:dataDump ?dumpfile; conversion:source_identifier ?source_id; conversion:dataset_identifier ?dataset_id .
    optional{?dataset3 conversion:dataset_version ?version_id}
    filter(!bound(?version_id))
  }  
} ORDER BY ?dataset3
What types are instances of void:Dataset? < h2>
prefix conversion: <http://purl.org/twc/vocab/conversion/>
prefix void:          <http://rdfs.org/ns/void#>
 
SELECT DISTINCT ?type
WHERE {
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {  
     ?dataset a void:Dataset ; a ?type .
  }
} ORDER BY ?type
Your rating: None Average: 5 (1 vote)