Internal Technology

WebSci 2013 Senator Eric Adams Information Cloud

Description: 
A data mash-up that shows how Senator Adams allocates his project funds.


Senator Adams' Fund Allocation — Projects & Funding



Table - Projects & Funding

The following table outlines all of Senator Adams' projects, their purpose and funding:

Powered by Socrata

Project Description:

This is a data mash-up that shows how Senator Adams allocates his project funds. SPARQL files are hard linked to at senator funding, agency funding, and a frequency-based world cloud.
For more information regarding the project itself, see the write-up at the following URL: Team 9 Project Write-up

By: Daniel Gilligan, Rob Levy, Ben Vreeland

Pie Chart

Loading Pie Chart...

Tag Cloud

Loading Cloud...

Facebook Feed

Twitter Feed

Thumbnail: 

Electric Power Generation and Fuel Consumption by Month and State, 2001 to the Present

Description: 
This demo shows the electric power generated by each state, along with the fual required to produce that electric power. Lighter colored states have generate less than the darker states. By using the drop boxes, you can view the data by different years and by different fuel types. When you click on a state, a two charts appear to the right that shows for every month in that year, the megawatt hours of electric power generated, and the fuel used to produce it.
Contributor:

Electric Power from Coal during 2001




click a state in map to show more information here

click a state in map to show more information here
description This demo shows the electric power generated by each state, along with the fual required to produce that electric power. Lighter colored states have generate less than the darker states. By using the drop boxes, you can view the data by different years and by different fuel types.
When you click on a state, a two charts appear to the right that shows for every month in that year, the megawatt hours of electric power generated, and the fuel used to produce it.
creator Dominic DiFranzo
created 2011/06/27
datasets
SPARQL query energy-sparql.php
SPARQL endpoint http://logd.tw.rpi.edu/sparql
Uses Technology: 
Thumbnail: 

LOD Data Quality Vocabulary: LODQ

Details

Namespace: http://logd.tw.rpi.edu/lodq#
Imports:
Version: 0.1
Created: 12 April 2011
Modified: 13 April 2011
Authors: olyerickson, alvarograves
Discussion:

Introduction

Insert more extensive intro to subject of linked data quality here. Include mention of Quality Indicators for Linked Data Datasets discussion, originated by Leigh Dodds in Jan 2010...

LODQ is a proposed vocabulary to describe the "quality" of datasets inspired by the 15 metrics proposed by Glenn McDonald in his email to public-lod[1], later augmented with three additional metrics from Dave Reynolds.[2]

LOQD is intended to be domain-extensible, enabling each of the quality metrics to be defined based on the requirements of a particular domain. Thus each LOQD metric is represented as a class which should be further defined by the domain. This is illustrated by the following figure:

Our objective is to enable assertions of the form...

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix lodq: <http://logd.tw.rpi.edu/lodq#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix olo: <http://purl.org/ontology/olo/core#> .
@prefix ex: <http://example.org/> .
 
ex:binaryMetric
    lodq:explicitValue [
        olo:slot [
            olo:index 1 ;
            olo:item "yes"
        ], [
            olo:index 2 ;
            olo:item "no"
        ] ;
        a olo:OrderedList
    ] ;
    dcterms:creator <http://alvaro.graves.cl> ;
    a lodq:Metric .
 
ex:myMetric
    lodq:maxValue "10"^^xsd:int ;
    lodq:minValue "1"^^xsd:int ;
    dcterms:creator <http://alvaro.graves.cl> ;
    a lodq:Metric .
 
<http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2010-Oct-22/conversion/raw>
    a lodq:Dataset .
 
[]
    lodq:completeness <http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2010-Oct-22/conversion/raw> ;
    lodq:usesMetric ex:myMetric ;
    lodq:value "6"^^xsd:int ;
    dcterms:creator <http://tw.rpi.edu/instances/JohnErickson> ;
    a lodq:Measure .
 
[]
    lodq:justifiedBy [
        dcterms:description "The license is http://...." ;
        a lodq:Justification
    ] ;
    lodq:licensed <http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2010-Oct-22/conversion/raw> ;
    lodq:usesMetric ex:binaryMetric ;
    lodq:value "yes" ;
    dcterms:creator <http://tw.rpi.edu/instances/JohnErickson> ;
    a lodq:Measure .

Vocabulary Definitions

The follow classes are inspired by Glenn and Dave's descriptions of their metrics for describing data quality:

Classes

Class: lodq:Dataset
Subclass of:
Has Subclasses:
Has Properties:
Description: A URI-named dataset.
Class: lodq:Metric
Subclass of:
Has Subclasses: Defined by domain...
Description: A superclass for domain-specific metric definition. Each metric instantiates this.

Properties

Property: lodq:qualityMeasure
Is of type: owl:dataTypeProperty
Range: rdfs:Literal
Has Subproperties: accuracy, intelligibility, referentialCorrespondence, completeness, boundedness, typing, modelingCorrectness, modelingGranularity, connectedness, isomorphism, currency, modelConsistency, attribution, history, internalConsistency, licensed, sustainable, authoritative
Description: A superproperty to all of the quality metric properties. May be extended by domain-specific metrics not considered here.
Property: lodq:accuracy
Subproperty of: lodq:qualityMeasure
Description: Are the individual nodes that refer to factual information factually and lexically correct. Like, is Chicago spelled "Chigaco" or does the dataset say its population is 2.7?
Property: lodq:intelligibility
Subproperty of:
lodq:qualityMeasure
Description: Are there human-readable labels on things, so you can tell what a thing is when you're looking at? Is there a model, so you can tell what questions you can ask? If a thing has multiple labels (or a set of owl:sameAs things havemlutiple labels), do you know which (or if) one is canonical?
Property: lodq:referentialCorrespondence
Subproperty of:
lodq:qualityMeasure
Description: If a set of data points represents some set of real-world referents, is there one and only one point per referent? If you have 9,780 data points representing cities, but five of them are "Chicago", "Chicago, IL", "Metro Chicago", "Metropolitain Chicago, Illinois" and "Chicagoland", that's bad.
Property: lodq:completeness
Subproperty of:
lodq:qualityMeasure
Description: Where you have data representing a clear finite set of referents, do you have them all? All the countries, all the states, all the NHL teams, etc? And if you have things related to these sets, are those projections complete? Populations of every country? Addresses of arenas of all the hockey teams?
Property: lodq:boundedness
Subproperty of:
lodq:qualityMeasure
Description: Where you have data representing a clear finite set of referents, is it unpolluted by other things? E.g., can you get a list of current real countries, not mixed with former states or fictional empires or adminstrative subdivisions?
Property: lodq:Typing
Subproperty of:
lodq:qualityMeasure
Description: Do you really have properly typed nodes for things, or do you just have literals? The first president of the US was not "George Washington"^^xsd:string, it was a person whose name-renderings include "George Washington". Your ability to ask questions will be constrained or crippled if your data doesn't know the difference.
Property: lodq:modelingCorrectness
Subproperty of:
lodq:qualityMeasure
Description: Is the logical structure of the data properly represented? Graphs are relational databases without the crutch of "rows"; if you screw up the modeling, your queries will produce garbage.
Property: lodq:modelingGranularity
Subproperty of:
lodq:qualityMeasure
Description: Did you capture enough of the data to actually make use of it. ":us :president :george_washington" isn't exactly wrong, but it's pretty limiting. Model presidencies, with their dates, and you've got much more powerful data.
Property: lodq:connectedness
Subproperty of:
lodq:qualityMeasure
Description: If you're bringing together datasets that used to be separate, are the join points represented properly. Is the US from your country list the same as (or owl:sameAs) the US from your list of presidencies and the US from your list of world cities and their populations?
Property: lodq:isomorphism
Subproperty of:
lodq:qualityMeasure
Description: If you're bring together datasets that used to be separate, are their models reconciled? Does an album contain songs, or does it contain tracks which are publications of recordings of songs, or something else? If each data point answers this question differently, even simple-seeming queries may be intractable.
Property: lodq:currency
Subproperty of:
lodq:qualityMeasure
Description: Is the data up-to-date?
Property: lodq:modelConsistency
Subproperty of:
lodq:qualityMeasure
Description: whichever way you make modelling decisions such as direction of relations (from country to president, from president to country) it is done consistently so you don't have to ask many permutations of the same query. Note: Was "Directionality"
Property: lodq:attribution
Subproperty of:
lodq:qualityMeasure
Description: If your data comes from multiple sources, or in multiple batches, can you tell which came from where?
Property: lodq:history
Subproperty of:
lodq:qualityMeasure
Description: If your data has been edited, can you tell how and by whom?
Property: lodq:internalConsistency
Subproperty of:
lodq:qualityMeasure
Description: Do the populations of your counties add up to the populations of your states? Do the substitutes going into your soccer matches balance the substitutes going out?
Property: lodq:licensed
Subproperty of:
lodq:qualityMeasure
Description: The license under which the data can be used is clearly defined, ideally in a machine checkable way.
Property: lodq:sustainable
Subproperty of:
lodq:qualityMeasure
Description: There is some credible basis for believing the data will be maintained as current (e.g. backed by some appropriate organization or by a sufficiently large group of individuals, has been updated frequently in the past).
Property: lodq:authoritative
Subproperty of:
lodq:qualityMeasure
Description: Is the provider of the data a credible authority on the subject. For example, in the UK then Companies House has the definitive information on registered UK companies and no amount of crowd sourcing can change that fact that if the company is not registered with them then it is not registered

References & Resources

  1. Glenn McDonald, 15 Ways to Think About Data Quality (Just for a Start) (8 Apr 2011 21:10:05 -0400)
  2. Dave Reynolds, Re: 15 Ways to Think About Data Quality (Just for a Start) (12 Apr 2011 09:21:36 +0100)
  3. The Pedantic Web Group Working with data publishers, tool builders, application developers and standards groups to create a more interoperable Web of Data...
  4. Semanticweb.com discussion on Quality Indicators for Linked Data Datasets Question added Jan 2010; Glenn McDonald's "15 Ways" added to the discussion 12 April 2011

lod-apps

Description: 
Software package, offering RESTful web services including: phpCsv2Rdf, phpJson2Rdf, and phpSparqlExplorer.

csv2rdf4lod

Description: 
In its simplest form, csv2rdf4lod is a quick and easy way to produce an RDF encoding of data available in Comma-Separated-Values (CSV). In its advanced form, csv2rdf4lod is a custom reasoner tailored for some heavy-duty data integration.
See https://github.com/timrdf/csv2rdf4lod-automation/wiki for the source code, documentation, examples, and issues tracking.

LOGD resources relating to csv2rdf4lod:

  • Please read the following tutorial to start using cvs2rdf4lod.

  • The REST based conversion service?

  • The SPARQL endpoint http://logd.tw.rpi.edu/sparql hosts RDF data created by csv2rdf4lod

SparqlProxy

Description: 
A TWC LOGD service that help users to proxy SPARQL endpoint access, cache SPARQL query results, and convert SPARQL query results into many different formats.

Source Code and Resources

SparqlProxy is a TWC LOGD service:

RESTful Service Interface Description

Parameter Status Description
service-uri stable URI of SPARQL service.
query stable SPARQL query string
query-uri stable URI of SPARQL query. Note you can only use one of "query-uri" and "query" because they are mutually exclusive.
output stable (optional) the output format. Default is xml. All values a listed as below:
  • xml => SPARQL/XML
  • sparqljson => SPARQL/JSON
  • exhibit => EXHIBIT/JSON
  • gvds => GoogleViz/JSON
  • csv => CSV
  • html => HTML
  • tablerow => Table(Row)
  • tablecol => Table(Col)
To keep backward comparability, both "sparqljosn" and "sparql" can be used to output SPARQL/JOSN format.
callback experimental (optional) callback function name. This param is only applicable to two output formats: exhibit, sparqljson
tqx experimental (optional) for google visluziation api only. e.g. version:0.6;reqId:1;responseHandler:myQueryHandler
refresh-cache experimental (optional) SparqlProxy use a cache by default. User may opt out (avoid caching) by setting "refresh-cache=on" in service request
textoutput experimental (optional) set "text/plain" as content-type in HTTP response header, so users can view the result in browser. To enable it, add "textoutput=yes" in service request
ui-option experimental (optional) SparqlProxy allow users to show the query of SPARQL query result. To enable it, add "ui-option=query" in service request

Example Usage

The following example uses DBpedia SPARQL endpoint and the following SPARQL query (listing 10 triples, also published at http://logd.tw.rpi.edu/query/stat_ten_triples.sparql)
http://dbpedia.org/sparql
http://logd.tw.rpi.edu/query/stat_list_ten_triples.sparql
SELECT ?s ?p ?o WHERE {?s ?p ?o} limit 10

Discussion

Q: Why use cache in sparqlproxy.
A: The use of cache in LOGD sparql endpoint has historical reasons. The cache capability is used specifically reduce load on repetitive queries over LOGD server (e.g. load dynamically generated web pages). The cache will automatically expire, or you may use "refresh-cache=on" option in RESTful request to bypass the cache.
Q: Why two sparqlproxy instances?
A: We maintain two sparqlproxy instances on LOGD server: http://logd.tw.rpi.edu/sparql dedicates to the LOGD sparql endpoint, and http://logd.tw.rpi.edu/ws/sparqlproxy.php allows users to access other sparql endpoints. Users may download the sparqlproxy code and install an instance on their own computer, allowing external users to access their internal triplestores.
Syndicate content