Warning: Table './drupal/watchdog' is marked as crashed and last (automatic?) repair failed query: INSERT INTO watchdog (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (0, 'php', '%message in %file on line %line.', 'a:4:{s:6:\"%error\";s:7:\"warning\";s:8:\"%message\";s:40:\"Creating default object from empty value\";s:5:\"%file\";s:57:\"/data/www/html/drupal/modules/taxonomy/taxonomy.pages.inc\";s:5:\"%line\";i:33;}', 3, '', 'https://logd.tw.rpi.edu/category/keywords/twc_logd_101_tutorial_series', '', '10.0.1.254', 1575626657) in /data/www/html/drupal/includes/database.mysqli.inc on line 134

TWC LOGD 101 Tutorial Series

Open Goverment Data Video Collection

Level: 
LOGD 101
Contributor: 
Description: 
This collection of videos helps you understand the meaning and value of open government data.

"The next Web of open, linked data", by Tim Berners-Lee, Ted 2009

  • This talk is about linked open data. Pay attention to interesting mashups that used linking data.
  • at minute 11, Tim called for "Raw. Data. Now!"

"The year open data went worldwide", by Tim Berners-Lee, Ted 2010

  • This talk is about the deployment of linked open data. Pay attention to the open government section.

"Open, Linked Data for a Global Community", by Tim Berners-Lee, Gov 2.0 Expo 2010

  • This is linked government data talk. Pay attention to TBL's "five star rating".
  • Here is my <140 character twit summary:
    1.web downloadable
    2.structured data preserved
    3.use interoperable format
    4.linkable things with URI
    5.linked to other data
    
  • also see the 5 stars of open linked data, by Ed Summers
  • also see Linked Open Data star scheme by example, by Michael Hausenblas

Mashing up LOGD data with SPARQL

Level: 
LOGD 101
Contributor: 
Contributor: 
Description: 
This tutorial describes SPARQL queries that mash up LOGD data from different datasets. It contains examples showing how the SPARQL queries are connected with LOGD data at different levels of granularity.
Prerequisites: 
Prerequisites: 
Prerequisites: 
Prerequisites: 
Prerequisites: 

What to Expect

By the end of this tutorial you should be able to understand how to use SPARQL queries to mash up LOGD datasets, and how SPARQL query is connected to LOGD data and metadata at different levels of granularity.

What You Need to Know

This tutorial assumes familiarity in the following areas:

The Basic Idea

We assume required LOGD data are loaded into a triples store (a RDF data management system like database but provides SPARQL query interface). As shown in the following figure, the content of dump file of the specific version of a LOGD dataset is first loaded into one RDF dataset named by the version's URI (we use "RDF Dataset 1" in the figure). By manually looking at the dataset's metadata and sample data, users can identify how records from different RDF datasets can be linked by common objects which have the same value. The data linking knowledge is then used to compose a SPARQL query on the two RDF datasets as shown on top of the triple store. In what follows, we will provide real world example SPARQL queries that mash up LOGD data.
logd-design-mashup-sparql.png

Section 1. Mashing up LOGD data using String Matching

Datasets and Samples

In this section, we will be using datasets 1356 and 353, obtained from Data.gov. The specific RDF conversions we will be looking at are: In producing these RDF versions of the datasets, one of the primary challenges involved establishing mappings between property values. For instance, establishing the fact that the entity "Alabama" referenced in dataset 1356 is equivalent to the "Alabama" from dataset 353. In our RDF conversion, US states in both datasets are referenced by a common set of string-based FIPS codes. Consider the example below, comparing RDF from both datasets:
Dataset Example
Dataset 1356 RDF/XML
<http://logd.tw.rpi.edu/source/data-gov/dataset/1356/version/2009-Dec-03/thing_2>
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/county_code> "000" ;
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/state_code> "01" ;
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/agi> "75020497";
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/state_abbrv> "AL" .
Properties Used:
  • "state_code": state's FIPS code
  • "county_code": county's code, "000" means the record is about the entire state, not a specific county of the state
  • "agi": state's adjusted gross income (AGI)
  • "state_abbrv": state name's abbreviation
Dataset 353 RDF/XML
 <http://logd.tw.rpi.edu/source/data-gov/dataset/353/version/1st-anniversary/thing_3>
   <http://logd.tw.rpi.edu/source/data-gov/dataset/353/vocab/raw/pub_fips> "01" ;
   <http://logd.tw.rpi.edu/source/data-gov/dataset/353/vocab/raw/popu_st> "4599030";
   <http://logd.tw.rpi.edu/source/data-gov/dataset/353/vocab/raw/stlaname> "ALABAMA PUBLIC LIBRARY SERVICE" .
Properties Used:
  • "pub_fips" state's FIPS code
  • "popu_st" state's population
  • "stlaname" state library agency's name
Here, the properties "state_code" (Dataset 1356) and "pub_fips" (Dataset 353) both point to a standard FIPS code representing "Alabama".

SPARQL Query

The two RDF dataset conversions we mentioned earlier: are presently loaded into the TWC LOGD triple store (http://logd.tw.rpi.edu/sparql). Information from both datasets can be retrieved by the following SPARQL query:
SELECT distinct ?state_abbv ?agi ?population 
WHERE {
 GRAPH  <http://logd.tw.rpi.edu/source/data-gov/dataset/353/version/1st-anniversary>{
  
   ?s1  <http://logd.tw.rpi.edu/source/data-gov/dataset/353/vocab/raw/popu_st> ?population.
   ?s1  <http://logd.tw.rpi.edu/source/data-gov/dataset/353/vocab/raw/pub_fips> ?state_fipscode .   
 }
 GRAPH <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/version/2009-Dec-03> {
   ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/state_abbrv> ?state_abbv .
   ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/county_code> "000" .
   ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/agi> ?agi.
   ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/state_code> ?state_fipscode .
 }
} 
order by ?state_fipscode
Here, the line  ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/county_code> "000" . is used to constrain results to state-level information only (as opposed to both state and county level). Sending this query to the TWC LOGD triple store returns results of the form:
state_abbv agi population
"AL" "92162773" "4599030"

Important Points

  • Establishing property value mappings between datasets is a challenging problem, which makes it an important area for future research. Standardized representations of entities, such as FIPS codes for US States, can be used for this purpose.
  • Selective use of RDF properties in SPARQL queries can be used to narrow results obtained. For instance, dataset 1356 has many rows related to a state, but the constraint "county_code=000" help us focus on state-level data, as opposed to county-level data.

Section 2. Mashing up LOGD data using Enhanced LOGD data

Datasets

In this section, we will be using datasets 1356 and 1623, obtained from Data.gov. The specific RDF conversions we will be looking at are: Earlier, we mentioned that establishing property value mappings between datasets can be challenging, and proposed the use of standardized string-based representations for handling this (e.g. using FIPS codes for representing US States). However, doing this doesn't produce explicit semantic linking between datasets. In addition, it forces datasets to adopt a common string-based representation for a given property. In practice, this is often not feasible - while one dataset may refer to US states their full name, others may do so by 2-letter postal codes. Therefore, we attempt to provide RDF-based definitions for property values where possible. In the case of representing US States, two state representations are defined: one corresponding to a postal code, and another for the full name.
Property Example
Postal Code RDF/XML for Dataset 1356 entry:
<http://logd.tw.rpi.edu/source/data-gov/dataset/1356/version/2009-Dec-03/thing_2>
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/state_code> "01" ;
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/agi> "75020497";
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/enhancement/1/state_abbrv> 
     <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/typed/state_abbreviation/AL> ;
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/state_abbrv> "AL" .
RDF/XML for state abbreviation entry:
<http://logd.tw.rpi.edu/source/data-gov/dataset/1356/typed/state_abbreviation/AL>
     owl:sameAs   <http://dbpedia.org/resource/Alabama>.
Properties Used:
  • "state_code": state's FIPS code
  • "agi": state's adjusted gross income (AGI)
  • "state_abbrv": state name's abbreviation
Full Name RDF/XML for Dataset 1623 entry:
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/thing_31>
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/raw/state> "Alabama" ;
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/enhancement/1/state> <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/value-of/state/Alabama> ;
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/enhancement/1/fiscal_year_07> 647 .
RDF/XML for state entry:
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623/value-of/state/Alabama>
     owl:sameAs   <http://dbpedia.org/resource/Alabama>.
Properties Used:
  • "state": state's name
  • "fiscal_year_07": medicare claims in fiscal year 2007
In both examples, the RDF-based representation for "Alabama" is asserted to be equivalent to the representation on DBPedia.  

SPARQL Query

Below is a SPARQL query for retrieving data from: Among other things, this query returns URIs (?altStateURI) equivalent to the state representation used in each entry (?state_abbrv):
SELECT distinct ?state_abbrv ?agi ?claims ?altStateURI
WHERE {
 GRAPH  <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17>{  
  ?s1  <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/enhancement/1/fiscal_year_07> ?claims.
  ?s1  <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/enhancement/1/state> ?state1623.   
  ?state1623  owl:sameAs ?altStateURI .
 }
 GRAPH <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/version/2009-Dec-03> {
  ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/state_abbrv> ?state_abbrv .
  ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/county_code> "000" .
  ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/agi> ?agi.
  ?s2 <http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/enhancement/1/state_abbrv> ?state1356.
  ?state1356 owl:sameAs ?altStateURI .
 }
} 
order by ?state_abbrv
Below are sample results
state_abbv agi claims altStateURI
"AK" "17312636" "30" ^^<http://www.w3.org/2001/XMLSchema#integer> <http://dbpedia.org/resource/Alaska>
"AL" "92162773" "647" ^^<http://www.w3.org/2001/XMLSchema#integer> <http://dbpedia.org/resource/Alabama>

Important Points

  • The two datasets cannot be connected by simple string matching, since variations on string-based property values may be used (e.g. one dataset may refer to US States by their full name, while another may use postal codes). Therefore, our RDF conversion promoted the use of RDF resource-based property values, linkable to other sections of the Linked Data Cloud (such as DBPedia). This data enhancement provided by our RDF conversion can help developers by saving them the hassle of writing ad hoc code for linking datasets in applications.

Understanding LOGD Metadata

Level: 
LOGD 101
Contributor: 
Contributor: 
Description: 
This tutorial describes the organization of RDF-based LOGD data published on this site and the corresponding metadata.
Prerequisites: 
Prerequisites: 

What to Expect

By the end of this tutorial you should be able to understand the structure of LOGD data, from which source the dataset was obtained, from which OGD dataset the RDF-based LOGD dataset was converted, the human contribute registry metadata for LOGD datasets, and etc. This tutorial complements the tutorial Understanding LOGD Data.

What You Need to Know

This tutorial assumes you are familiar with concepts found in the following resources:
  • Resource Description Framework (RDF) is a standard model for data interchange on the Web. See [1]
  • Terse RDF Triple Language (Turtle) is a syntax language for serializing RDF. We use it throughout our tutorials to encode RDF data. See [2]

LOGD Data Organization Diagram

The LOGD data organization Diagram (see figure below) provides an abstract model on the organization of LOGD data. The "Levels of structural data granularity" dimension explains different levels of granularity of LOGD data.
  • "source" refers to the data publishers who maintains a catalog OGD datasets for download. An example source is Data.gov (http://data.gov).
  • "dataset" refers to an OGD dataset. A dataset is typically determined by the data publishers, for example, "Dataset 1623 (OMH Claims Listed by State)" is a dataset entry in Data.gov catalog (see http://www.data.gov/details/1623).
  • "table" refers to a data table (organized in tabular structure) in OGD datasets. Although an OGD dataset often contains one table, it may also contain multiple tables. In "Dataset 1623", there is only one table. Note that the data in OGD dataset may be stored in non-tabular structure, e.g. an XML tree. Those data structures are out of scope of this tutorial.
  • "record" refers to a data row in a data table.
The "Data Publishing Stages" dimension explains the data publishing process of LOGD data.
  • at "dataset" stage, raw OGD data are available for download at certain Web locations. Note that raw OGD data are subject to change by the data publishers: users may download different versions of dataset from the same URL.
  • at "version" stage, snapshots of raw OGD data are created and versioned. This stage archives the content of the OGD data at a certain time point and provides persistent access to the capture version of the raw OGD data. Note that a dataset may contain multiple parts (e.g. data tables) each of which is stored in a static file.
  • at "conversion layer" stage, conversion configurations are used to convert the raw OGD data into the corresponding LOGD data. The basic conversion configuration is "raw", which is automatically generated with the minimal manual input. A number of manually crafted enhancement configurations are also allowed to generate monotonically incremental LOGD data.
logd-design-metadata-dataset.png

LOGD Metadata by Example

All LOGD metadata are embedded in the corresponding LOGD data dumps. In what follows, we use "Dataset 1623" to explain the actual metadata used to describe and relate the concepts introduced in the previous section. Examples below are extracted from the data dump for data.gov's Dataset 1623's (version 2010-Sept-17).
 Step1. download http://logd.tw.rpi.edu/source/data-gov/file/1623/version/2010-Sept-17/conversion/data-gov-1623-2010-Sept-17
  * rename the saved file from "data-gov-1623-2010-Sept-17" to "data-gov-1623-2010-Sept-17.tar.gz"
  * run Linux shell command "tar -zxf data-gov-1623-2010-Sept-17.tar.gz" to unzip the file
Note that the dump file appends the LOGD data from all "conversion layers" of the "version". In this file, readers may see some lines (lines 1,1388 and 5563) starting with "# BEGIN", each of which should be considered as the syntactic separator for the LOGD data produced by a "conversion layer". Such "append" operation is possible because the "conversion layers" are monotonic increments. Please also note the "append" operation may keep redundant metadata.
  # BEGIN: publish/data-gov-1623-2010-Sept-17.e1.ttl:
  ...
  # BEGIN: publish/data-gov-1623-2010-Sept-17.e2.ttl:
  ...
  # BEGIN: publish/data-gov-1623-2010-Sept-17.raw.ttl:
 

Source Metadata

The metadata for source (lines 1104-1106) shows the URI, webpage and identifier of the data source.
  • the URI of dataset uniquely identifies the dataset and it is created via the following rule
<source_uri> ::= <base_uri>/source/<source_identifier>
  • the dataset is related to its html web page via "foaf:isPrimaryTopicOf"
<http://logd.tw.rpi.edu/source/data-gov> a foaf:Agent ;
	dcterms:identifier "data-gov" ;
	foaf:isPrimaryTopicOf <http://logd.tw.rpi.edu/source_page/data-gov> .

Dataset Metadata

The metadata for a dataset (lines 1048-1056) includes both description and relations to other entities.
  • its URI uniquely identifies itself and is created based on the following rule
<dataset_uri> ::= <base_uri>/source/<source_identifier>/dataset/<dataeset_identifier>
  • it is related to its source using "dcterms:source" property
  • it is related to its subsets using "void:subset". Note the subsets include all versions of the dataset and the metadata of the dataset.
  • its web page is annotated using "foaf:isPrimaryTopicOf"
  • the modification date of a version of the dataset is annotated using "dcterms:modified". A dataset with multiple versions should have multiple values for "dcterms:modified".
 
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623> a data-gov_vocab:Dataset , conversion:Dataset , void:Dataset , conversion:UnVersionedDataset ;
	conversion:base_uri "http://logd.tw.rpi.edu" ;
	conversion:source_identifier "data-gov" ;
	conversion:dataset_identifier "1623" ;
	dcterms:source <http://logd.tw.rpi.edu/source/data-gov> ;
	dcterms:identifier "data-gov 1623" ;
	foaf:isPrimaryTopicOf <http://logd.tw.rpi.edu/source/data-gov/dataset_page/1623> ;
	void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17> , <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/subset/meta> ;
	dcterms:modified "2010-09-09T12:32:49.632-00:05"^^xsd:dateTime .
Besides the machine produced metadata, additional informational metadata about the dataset may be obtained from the data catalog portal. In our example, Data.gov also publishes the manually contributed registry metadata (Dataset 92 Data.gov catalog) which includes metadata for Dataset 1623 (also see human readable page of registry metadata). Following are some sample data extracted from Dataset 92 (version 2010-Sep-20). This metadata can be easily integrated with the above metadata as they are describing the same dataset URI.
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623> dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/data-gov/dataset/92/version/2010-Sep-20> ;
	e1:id "1623" ;
	dcterms:identifier "1623" ;
	e1:url <http://www.data.gov/details/1623> ;
	foaf:homepage <http://www.data.gov/details/1623> ;
	e1:title "OMH Claims Listed by State" ;
	dcterms:title "OMH Claims Listed by State" ;
	e1:agency "Department of Health and Human Services" ;
       ...

Version Metadata

The metadata for a version (lines 1058-1069) includes both description and relations to other entities.
  • its URI uniquely identifies itself and is created based on the following rule
<version_uri> ::= <base_uri>/source/<source_identifier>/dataset/<dataset_identifier>/version/<version_identifier>
  • it is uniquely typed using "conversion:VersionedDataset "
  • it is related to its source using "dcterms:source" property
  • it is related to its html web page using "foaf:isPrimaryTopicOf"
  • it is related to its subsets using "void:subset". Note the subsets include the conversion layers of the version. As the example data dump is concatenation of the LOGD data of each conversion layer, there is only one value of "void:subset" on line 1069. However, additional subsets are mentioned in other places (e.g. line 6382).
  • its modification date is annotated using "dcterms:modified"
  • its file dump is annotated using "void:dataDump".
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17> a data-gov_vocab:Dataset , conversion:Dataset , conversion:VersionedDataset , void:Dataset ;
	conversion:base_uri "http://logd.tw.rpi.edu" ;
	conversion:source_identifier "data-gov" ;
	conversion:dataset_identifier "1623" ;
	conversion:version_identifier "2010-Sept-17" ;
	conversion:dataset_version "2010-Sept-17" ;
	dcterms:source <http://logd.tw.rpi.edu/source/data-gov> ;
	dcterms:identifier "data-gov 1623 2010-Sept-17" ;
	dcterms:modified "2010-09-09T12:32:49.632-00:05"^^xsd:dateTime ;
	foaf:isPrimaryTopicOf <http://logd.tw.rpi.edu/source/data-gov/dataset_page/1623/version/2010-Sept-17> ;
	void:dataDump <http://logd.tw.rpi.edu/source/data-gov/file/1623/version/2010-Sept-17/conversion/data-gov-1623-2010-Sept-17> ;
	void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/conversion/enhancement/1> .
(also see lines 6371 and 6382)
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17> a data-gov_vocab:Dataset , conversion:Dataset , conversion:VersionedDataset , void:Dataset ;
	void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/conversion/raw> .

Conversion Layer Metadata

The metadata for a a conversion layer of a version of a dataset (lines 6384-6398) includes both description and relations to other entities.
  • its URI uniquely identifies itself and is created based on the following rule
<conversion_uri> ::= <base_uri>/source/<source_identifier>/dataset/<dataset_identifier>/conversion/<conversion_identifier>
  • the enhancement identifier are assigned according to the following rule
"raw" for the raw conversion layer, i.e. the basic conversion with minimal human input
"enhancement/1", "enhancement/2",... for the enhancement conversion layers
  • it is uniquely typed by "conversion:LayerDataset"
  • it is related to its source using "dcterms:source" property
  • it is related to its subsets using "void:subset". Note the subsets include its sample dataset
  • it is related to its used predicates using "conversion:uses_predicate"
  • it is related to several URIs of its sample records using "void:exampleResource"
  • its creation date is annotated using "dcterms:created"
  • its modification date is annotated using "dcterms:modified", indicating the last execution of the conversion.
  • its size (number of triples) is annotated using "conversion:num_triples"
  • its homepage (html web page) is annotated using "foaf:isPrimaryTopicOf"
  • its file dump is annotated via "void:dataDump".
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/conversion/raw> a data-gov_vocab:Dataset , conversion:Dataset , conversion:LayerDataset , void:Dataset ;
	dcterms:modified "2010-09-09T12:20:34.769-00:05"^^xsd:dateTime ;
	conversion:base_uri "http://logd.tw.rpi.edu" ;
	conversion:source_identifier "data-gov" ;
	conversion:dataset_identifier "1623" ;
	conversion:version_identifier "2010-Sept-17" ;
	conversion:dataset_version "2010-Sept-17" ;
	conversion:conversion_identifier "raw" ;
	void:dataDump <http://logd.tw.rpi.edu/source/data-gov/file/1623/version/2010-Sept-17/conversion/data-gov-1623-2010-Sept-17.raw> ;
	void:subset <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/conversion/raw/subset/sample> ;
	dcterms:created "2010-09-09T12:20:34.520-00:05"^^xsd:dateTime ;
	dcterms:source <http://logd.tw.rpi.edu/source/data-gov> ;
	dcterms:identifier "data-gov 1623 2010-Sept-17 raw" ;
	conversion:uses_predicate raw:column_8 , raw:total , raw:fiscal_year_09 , raw:fiscal_year_08 , raw:fiscal_year_07 , raw:fiscal_year_06 , raw:state , raw:region ;
	void:exampleResource ds1623:thing_8 , ds1623:thing_68 .
also see addition statistics (line 6469)
  <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/conversion/raw> conversion:num_triples "787"^^xsd:integer .

It is notable that the properties used by the LOGD data produced by the enhancement conversion layers can be linked to the properties used by the LOGD produced by the raw conversion layer. Following example (lines 33-38) shows the connection (see the last line) via "conversion:enhances".
e1:region ov:csvCol "1"^^xsd:integer ;
	ov:csvHeader "Region" ;
	conversion:enhancement_layer "1" ;
	rdfs:label "Region" ;
	rdfs:range rdfs:Resource ;
	conversion:enhances raw:region .

Metadata for Special LOGD Datasets

The LOGD conversion process also produces two small subset datasets for convenience.

Meta Dataset

A "meta dataset" is a subset data of a dataset, and its includes all metadata of the dataset generated by the converter. The example below (lines 1088-1093) show the metadata about the "meta dataset".
  • its URI uniquely identifies itself and is created based on the following rule
<datasetsample_uri> ::= <base_uri>/source/<source_identifier>/dataset/<dataset_identifier>/subset/meta
  • it is uniquely typed using "conversion:MetaDataset"
  • its file dump is annotated using "void:dataDump".
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623/subset/meta> a data-gov_vocab:Dataset , conversion:Dataset , conversion:MetaDataset , void:Dataset ;
	conversion:base_uri "http://logd.tw.rpi.edu" ;
	conversion:source_identifier "data-gov" ;
	conversion:dataset_identifier "1623" ;
	void:vocabulary <http://rdfs.org/ns/void#> , <http://inference-web.org/2.0/pml-provenance.owl#> , <http://inference-web.org/2.0/pml-justification.owl#> , <http://xmlns.com/foaf/0.1/> , <http://purl.org/dc/terms/> , <http://purl.org/NET/scovo#> ;
	void:dataDump <http://logd.tw.rpi.edu/source/data-gov/file/1623/version/2010-Sept-17/conversion/data-gov-1623-2010-Sept-17.void> .

Dataset Sample

A "dataset sample" is a subset dataset of the LOGD data produced by a conversion layer, and it includes 100 random records of the table of the dataset. The example below (lines 6407-6414) show the metadata about the "Sample dataset".
  • its URI uniquely identifies itself and is created based on the following rule
<datasetsample_uri> ::= <base_uri>/source/<source_identifier>/dataset/<dataset_identifier>/version/<version_identifier>/conversion/<conversion_identifier>/subset/sample
  • it is uniquely typed using "conversion:DatasetSample "
  • its file dump is annotated using "void:dataDump".
<http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/conversion/raw/subset/sample> a data-gov_vocab:Dataset , conversion:Dataset , conversion:DatasetSample , void:Dataset ;
	conversion:base_uri "http://logd.tw.rpi.edu" ;
	conversion:source_identifier "data-gov" ;
	conversion:dataset_identifier "1623" ;
	conversion:version_identifier "2010-Sept-17" ;
	conversion:dataset_version "2010-Sept-17" ;
	conversion:conversion_identifier "raw" ;
	void:dataDump <http://logd.tw.rpi.edu/source/data-gov/file/1623/version/2010-Sept-17/conversion/data-gov-1623-2010-Sept-17.raw.sample> .

Where to Go Next

With the understanding of the LOGD data, we suggest the following links for future reading and investigations.

Building LOGD Visualizations

Level: 
LOGD 101
Contributor: 
Contributor: 
Description: 
This tutorial presents a simple example that visualizes the LOGD data mashups.
Prerequisites: 
Prerequisites: 

What to Expect

By the end of this tutorial you should be able to generate a Javascript-based visualization, through use of the Google Visualization API.

What You Need to Know

This tutorial assumes you are familiar with concepts found in the following resources:
  • Javascript is a Web programming language. See [1]

The LOGD Mashup

The code described in the tutorial uses Dataset 353:State Library Agency Survey: Fiscal Year 2006 and Dataset 1356:Tax Year 2007 County Income Data from Data.gov. The expected output is a map of "Adjusted Gross Income(AGI) per Capita": a US map where each state is colored according to the average AGI per person living in that state. We obtain a state's AGI data from Dataset 1356 and a state's population data from Dataset 353. We also assume that state's population remain the same from fiscal year 2006 to fiscal year 2007. The LOGD data mashup is enabled by the following SPARQL query
 http://logd.tw.rpi.edu/demo/building-logd-visualizations/mashup-353-population-1356-agi.sparql

Static Visualization

Defining the HTML Layout

To make a visualization web-accessible, an accompanying HTML layout is necessary. HTML layouts can be managed through use of division (div) elements, which define different sections of a page. The HTML layout for our LOGD demo is given below, with a div element for representing the demo description, and another for defining where the visualization will be placed (with id='map_canvas').
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  	<title>AGI per Capita Map</title>
  </head>
  <body>
      <div>AGI per Capita Map: average adjusted gross income per person in dollar amount in US states.</div>
      <div id='map_canvas'>Loading Map ...</div>
  </body>
</html>
 

Inserting Visualization Code

Following the creation of an HTML layout, javascript-based visualization code may be inserted in the HEAD section. This code has the following objectives in our LOGD demo:
  1. Load the appropriate Google Visualization API packages (in this case, the GeoMap package).
  2. Define a callback function for loading visualization code, which is called upon the loading of the HTML page.
  3. Obtain data from a given source to pass to our GeoMap instance. The Google Visualization API is designed to accept data in the form of specially-formatted JSON (represented by the URI http://logd.tw.rpi.edu/demo/building-logd-visualizations/mashup-353-population-1356-agi.js) , which can then be fed to a JSON processing function.
  4. Following a call to the JSON processor, verify that it successfully processed the passed file.
  5. Get back a response from the query processor, containing the data from the JSON file.
  6. Define a data table to store the response data in. This process starts by defining header entries of the form TABLE.addColumn(DATATYPE, NAME).
  7. For each entry in the response, create a new data table row for the corresponding data.
  8. Define a configuration for the GeoMap instance to be visualized, containing information such as resolution.
  9. Define the GeoMap instance in the HTML div with id='map_canvas', using the configuration from Step 8 and data table from Step 7.
    <!--   import Google visualization  API -->
    <script type="text/javascript" src="http://www.google.com/jsapi"></script>

    <!--   customize function -->
    <script type="text/javascript">
    /* <![CDATA[ */
      

      // load google visualization packages - STEP 1
      google.load('visualization', '1', {'packages': ['geomap']});
        
      // set callback function for drawing visualizations - STEP 2
      google.setOnLoadCallback(drawMap);
   
      function drawMap() {
      	//load static data - STEP 3
      	var queryurl = "http://logd.tw.rpi.edu/demo/building-logd-visualizations/mashup-353-population-1356-agi.js";
      	var query = new google.visualization.Query(queryurl); // Send the query.
      	query.send(handleQueryResponse);
      }
  
      function  handleQueryResponse(response){
      	// Check for query response errors. - STEP 4
      	if (response.isError()) {
           alert('Error in query: ' + response.getMessage() + ' ' + response.getDetailedMessage());
           return;
      	}

      	// read data  - STEP 5
      	var data = response.getDataTable();
      	
      	// create new data - STEP 6
      	var newdata = new google.visualization.DataTable();
      	newdata.addColumn('string', 'State');
      	newdata.addColumn('number', 'AGI per Capita');

        // populate each row - STEP 7
      	var rows = data.getNumberOfRows();
      	for (var i = 0; i < rows; i++ )
      	{
      	  var state = 'US-' + data.getValue(i, 0);
      	  var value =  Math.round(data.getValue(i, 1)*1000/data.getValue(i, 2)); // AGI figure uses thousand-dollar unit
      	  newdata.addRow([state, value]);
      	}
      	
      	// configure map options - STEP 8
      	var options = {};
      	options['region'] = 'US';	// show US map
      	options['dataMode'] = 'regions';
      	options['width'] = 900;
      	options['height'] = 550;

        // define geomap instance - STEP 9
        var viz = document.getElementById('map_canvas');
        new google.visualization.GeoMap(viz).draw(newdata, options );    
      }
    /* ]]> */
    </script>
Once this code is inserted in the HEAD section, the visualization will appear: http://logd.tw.rpi.edu/demo/building-logd-visualizations/agi-per-capita-v2.html . NOTE: View in Firefox!

Build a SPARQL-based Dynamic Visualization

Our SPARQL endpoint (http://logd.tw.rpi.edu/ws/sparqlproxy.php) is designed to be capable of formatting results in JSON compatible with the Google Visualization API. In the above code, the section:
  var queryurl = "http://logd.tw.rpi.edu/demo/building-logd-visualizations/mashup-353-population-1356-agi.js";
corresponding to Step 3 above can be replaced with:
  //load data using SPARQL query
  var sparqlproxy = "http://logd.tw.rpi.edu/ws/sparqlproxy.php";
  var queryloc = "http://logd.tw.rpi.edu/demo/building-logd-visualizations/mashup-353-population-1356-agi.sparql";    
  var service = "http://logd.tw.rpi.edu/sparql";
  var queryurl = sparqlproxy 
                + "?" + "output=gvds"
                + "&service-uri=" + encodeURIComponent(service)
                + "&query-uri=" + encodeURIComponent(queryloc) ;
The above code passes a SPARQL query to our endpoint, returning the same information that is contained in the static JSON file. The modified coded is located at: http://logd.tw.rpi.edu/demo/building-logd-visualizations/agi-per-capita-v3.html NOTE: View in Firefox!

Extending this demo

The above code may be used as a starting point for generating your own GeoMap-based visualization. Doing this will require the following steps:
  1. Specifying a different SPARQL query (queryloc) in Step 3
  2. Modifying the column definitions in Step 6, to correspond to the new SPARQL query.
  3. Modifying the response processing code in Step 7.

Retrieving SPARQL Results

Level: 
LOGD 101
Contributor: 
Contributor: 
Description: 
This tutorial describes how to access SPARQL Endpoint, and how to retrieve and format query results using SparqlProxy in Javascrip, PHP and Python.
Prerequisites: 
Prerequisites: 

What to Expect

By the end of this tutorial you should be able to retrieve SPARQL query results in your applications, how to use TWC's SparqlProxy to format the SPARQL results, and how to use your favorite programming language (e.g. PHP, Python, Javascript) to retrieve SPARQL query results.

What You Need to Know

This tutorial assumes you are familiar with concepts found in the following resources:
  • Resource Description Framework (RDF) is a standard model for data interchange on the Web. See [1]
  • SPARQL Protocol and RDF Query Language (SPARQL) is an RDF query language. See [2]
  • RESTful Web Service is a simple web service protocol. see [3]

Talking to a Standard SPARQL Endpoint

There are many ways to get results from a SPARQL endpoint. Most endpoints provide a web form in which you can enter a query and get back the results in HTML or some other format. In this tutorial we'll focus on the SPARQL endpoint hosted by Data.gov, found at http://services.data.gov/sparql .
tutorial-datagov-sparql-query.png
In the above screen shot, we're presented with a form and a drop-down box of formats from which we can select to have our results returned to us. The query text area contains the following SPARQL query that is used to to discover the different government datasets that have been loaded in the triple store, along with the number of triples that are in that dataset.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
SELECT ?g ?number_of_triples
WHERE {
 GRAPH ?g {
   ?s a <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#Dataset> .
   ?s <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#number_of_triples> ?number_of_triples.
 }
}
ORDER BY ?g
The SPARQL query is also available at this URL: http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql Retrieving SPARQL results in HTML: You can see the results in HTML. Here is the link to the query results Retrieving SPARQL results in XML: As we can see in the drop down menu, we can get the results in a number of different formats, some for human readability, and other that are more machine consumable. When you run the query, you can see in your address bar your SPARQL query being used as a parameter. We can use this as a RESTful Web service Interface to the SPARQL endpoint as well. Here's an example of the same query above returning the result in "XML". Here is the link to the query results
http://services.data.gov/sparql?default-graph-uri=&query=+PREFIX+rdf%3A+<http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23>+%0D%0A+SELECT+%3Fg+%3Fnumber_of_triples%0D%0A+WHERE+{%0D%0A++GRAPH+%3Fg+{%0D%0A++++%3Fs+a+<http%3A%2F%2Fdata-gov.tw.rpi.edu%2F2009%2Fdata-gov-twc.rdf%23Dataset>+.%0D%0A++++%3Fs+<http%3A%2F%2Fdata-gov.tw.rpi.edu%2F2009%2Fdata-gov-twc.rdf%23number_of_triples>+%3Fnumber_of_triples.%0D%0A++}%0D%0A+}%0D%0A+ORDER+BY+%3Fg&format=application%2Fxml&debug=on&timeout=

Using SparqlProxy to Format SPARQL Query Results

TWC's SparqlProxy can be used to query a SPARQL endpoint (i.e. Web Service interface of a triple store), and perform some post-process on formatting the SPARQL query results. The results can be converted to many different formats such as CSV, Google Visualization JSON and Simile Exhibit JSON. This makes it easier to develop mashups and visualizations from SPARQL results. Let's try our example from last time, using SparqlProxy at http://logd.tw.rpi.edu/ws/sparqlproxy.php
tutorial-sparqlproxy-sparql-query.png
Retrieving SPARQL results in HTML: In the screenshot above, the interface here looks very similar as last time. If we want to query the SPARQL endpoint at data.gov using SparqlProxy, we can set the SPARQL End Point URL option to http://services.data.gov/sparql . Having this blank will query the LOGD SPARQL endpoint instead. Using the query from last time will return to us the same results as last time. Here is a link to the query result. Retrieving SPARQL results in Google Visualization Compatiable JOSN: You can see SparqlProxy gives us more options in output of results than the original SPARQL endpoint. By selecting "GoogleViz/JSON" option. You will see the results encoded in Google Visualization Compatible JSON. Here is the link to the query result. SparqlProxy Parameters Also like last time, we can access this service as a RESTful web service. The frequently used parameters are listed in the following table. More details about the parameters can be found at http://logd.tw.rpi.edu/technology/sparqlproxy.
Parameter Status Description
service-uri stable URI of SPARQL service, e.g. http://dbpedia.org/sparql, http://logd.tw.rpi.edu/sparql
query stable SPARQL query string
query-uri stable URI of SPARQL query. Use of "query-uri" and "query" are mutually exclusive, i.e., only use one.
output stable (optional) the output format. values: xml,sparql,exhibit,gvds,csv,html. Default is xml

Accessing SPARQL Query Results using SparqlProxy

Now we provide some example code fragment written in different programming languages to help you get started.

JavaScript

google.load("visualization", "1", {packages:["geomap","table","columnchart"]});
google.setOnLoadCallback(google_callback);
function google_callback(){ 
 var sparqlproxy = "http://logd.tw.rpi.edu/ws/sparqlproxy.php";
 var queryloc = "http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql";    
 var service = "http://services.data.gov/sparql";
 var queryurl = sparqlproxy 
                + "?" + "output=gvds"
                + "&service-uri=" + encodeURIComponent(service)
                + "&query-uri=" + encodeURIComponent(queryloc) ;
 var query = new google.visualization.Query(queryurl); // Send the query.
 query.send(handleQueryResponse);
}

function  handleQueryResponse(reponse){
  // Check for query response errors.
 if (response.isError()) {
   alert('Error in query: ' + response.getMessage() + ' ' + response.getDetailedMessage());
   return;
  }
  var data = response.getDataTable();
  ...
}

PHP

 // compose query 
 $sparqlproxy_uri = "http://logd.tw.rpi.edu/ws/sparqlproxy.php"
 $params = array();
 $params["query-uri"] = "http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql";
 $params["service-uri"] = "http://services.data.gov/sparql";
 $params["output"] =  "gvds";
 $query= $sparqlproxy_uri."?". http_build_query($params,,'&') ; //specific for Drupal
 
 //show query result
 echo file_get_contents($query);

Python

 import urllib2 
 sparqlproxy = "http://logd.tw.rpi.edu/ws/sparqlproxy.php"
 queryloc = "http://logd.tw.rpi.edu/demo/retrieving-sparql-results/datagov-list-loaded-dataset.sparql" 
 service = "http://services.data.gov/sparql"
 queryurl = sparqlproxy+"?output=gvds&service-uri="+service+"&query-uri="+queryloc

 response = urllib2.urlopen(queryurl) 
 json = response.read()

Understanding LOGD Data

Level: 
LOGD 101
Contributor: 
Description: 
This tutorial describes the structure of Linking Open Government Data (LOGD) data and how it is associated with the Open Government Data (OGD).
Prerequisites: 
Prerequisites: 

What to Expect

By the end of this tutorial you should be able to understand the how how the tabular government data is mapped to its RDF-based LOGD representation, and the basic elements of LOGD data.

What You Need to Know

This tutorial assumes you are familiar with concepts found in the following resources:
  • Resource Description Framework (RDF) is a standard model for data interchange on the Web. See [1]
  • Comma-Separated Values (CSV) is a simple text format for a database table. See [2]
  • Terse RDF Triple Language (Turtle) is a syntax language for serializing RDF. We use it throughout our tutorials to encode RDF data. See [3]

Open Government Data (OGD)

Open Government Data(OGD) refers to publicly available government data. Recently, many countries, such as the US (http://data.gov) and the UK (http://data.gov.uk), have released central open government data portals. In this tutorial, we will use "Dataset 1623 (OMH Claims Listed by State)" which is cataloged at http://data.gov as an example.

OGD Metadata

Each dataset published at Data.gov has a web page showing its metadata (e.g. who published it, summary of its content, where to download it, where to find additional information for understanding the dataset, and etc.). For example, we can collect some metadata (see below) for "Dataset 1623" from its Data.gov URL http://www.data.gov/details/1623.
 Step 1. go to http://www.data.gov/details/1623
Sample metadata about Dataset 1623 (source: http://www.data.gov/details/1623)
title OMH Claims Listed by State
description Total count of Claims received by Region, State and fiscal year.
agency Department of Health and Human Services

OGD Raw Data Files

An important mission of OGD portals is to support citizens in the downloading of raw data files. The web pages for Data.gov datasets have a dedicated section "Download Information" that lists available raw data in various formats, including XML, wikipedia:CSV CSV and XLS (Excel format). For example, Dataset 1623 has a raw data file in XLS (can be opened by Microsoft Excel or similar tools) downloadable at http://www.data.gov/download/1623/xls.
 Step 2. download a file from  http://www.data.gov/download/1623/xls
The following image shows a fragment of the raw data (accessed on Sep 17, 2010). It is easy to see that the raw data is essentially a table listing total OMHA claims received by region, state, and fiscal year.
tutorial-Understanding-LOGD-Data-1623-full.png
As show from the figure above, the raw data is not normalized for machine consumption
  • The table header is not on the first row.
  • Values in column 1 are missing for the sake of visual abbreviation (e.g., "Mid-West" should apply to everything from "Connecticut" through "West Virginia").
  • Values in column 2 are referencing entities that are very commonly recognized -- and are already mentioned in a slew of other datasets in a variety of slightly different ways (e.g., "MD" and "24", instead of "Maryland").
  • Columns 3 through 7 contain integers (_not_ the characters "1", ",", "0", "2", and "9"), but the integers are in comma-separated format

LOGD RDF Data

In this tutorial we're focusing on government data organized as tables. Although tabular data can be easily recognized by human users, clean-ups and format-conversions are needed to ensure that government data can be consumed by machines. The TWC LOGD RDF Data of a dataset is created by several automated (or semi-automated) conversion processes. Note that users need to assign a version identifier to the RDF Data because the conversion is done a snapshot of the raw data at a certain time. The RDF data of LOGD is available through "dump files" for downloading or by dereferencing an HTTP URI. For example, the zipped RDF dump file for Dataset 1623 (version 2010-Sept-17) is available at this link.
 step 3. download http://logd.tw.rpi.edu/source/data-gov/file/1623/version/2010-Sept-17/conversion/data-gov-1623-2010-Sept-17 
   * rename the saved file from "data-gov-1623-2010-Sept-17" to "data-gov-1623-2010-Sept-17.tar.gz"
   * run Linux shell command "tar -zxf data-gov-1623-2010-Sept-17.tar.gz" to unzip the file
Note: to unzip the file in windows, see http://www.gzip.org/. The downloaded RDF data has been encoded using Turtle syntax. The dump file appends results from several conversion processes, and we will only explain several essential fragments of the data file.

Namespace Declaration

The namespace declarations are used to support wikipedia:QName of URI. Below is the content on lines 2,13,23,24 and 25 in the dump file. Each line declared a prefix with corresponding wikipedia:XML_namespace.
...
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
...
@prefix ov: <http://open.vocab.org/terms/> .
...
@prefix raw: <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/raw/> .
@prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/enhancement/1/> .
@prefix ds1623: <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/> .
...

Property Definitions

The property definitions are used to (i) preserve the original text of each table header field name; and (ii) add additional descriptions contributed in data conversion process. Below is the content on lines from 5589 to 5592 in the dump file. The "rdfs:label" declares a human-readable label for the property "raw:region", which expands to a URI "http://logd.tw.rpi.edu/source/data-gov/dataset/1623/vocab/raw/region".
raw:region ov:csvCol "1"^^xsd:integer ;
  ov:csvHeader "Region" ;
  rdfs:label "Region" ;
  rdfs:range rdfs:Literal .

RDF Data Generated from Raw Conversion

The table areas in the raw data are converted into RDF representation following simple rules:
  • each row (except the header row) is identified by an RDF resource with unique HTTP URI
  • each column is associated with an RDF property with unique HTTP URI
  • each cell (in non-header rows) is recorded by an RDF triple "(s p o)" where s is the row's URI, p is the column's URI and o is the value of the cell.

A "raw" conversion simply translates the raw data (in CSV format) into RDF representation without minimal manual operation. Below is the content on lines 5624-5642 (generated by raw conversion) in the RDF dump file.
  • "ds1623:thing_8" on the first row is the wikipedia:URI of the record. It is a wikipedia:QName of the HTTP URI "http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17/thing_8"
  • "http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17" on the first row is the URI of the version of the dataset.
  • the entire first line corresponds to one RDF triple meaning "the record 'ds1623:thing_8' is referenced by the version of the dataset". This triple is automatically added by the TWC LOGD converter.
  • the second line corresponds to one RDF triple meaning "the record 'ds1623:thing_8' is related to a region named 'Mid-Atlantic'". This triple is associated with the 1st column ("region") of the raw data and the corresponding cell ("Mid-Atlantic") on the 8th row and the 1st column. It is notable that the "region" of "ds1623:thing_9" has empty string value.
  • the numbers after "raw:total" are still encoded in comma separated string.
ds1623:thing_8 dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17> ;
 	raw:region "Mid-Atlantic" ;
 	raw:state "District of Columbia" ;
	raw:fiscal_year_06 "12" ;
	raw:fiscal_year_07 "289" ;
	raw:fiscal_year_08 "342" ;
	raw:fiscal_year_09 "376" ;
	raw:total "1,019" ;
	ov:csvRow "8"^^xsd:integer .
 
ds1623:thing_9 dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17> ;
	raw:region "" ; 
	raw:state "Maryland" ; 
	raw:fiscal_year_06 "1,029" ;
	raw:fiscal_year_07 "3,565" ;
	raw:fiscal_year_08 "4,014" ;
	raw:fiscal_year_09 "2,403" ;
	raw:total "11,011" ;
	ov:csvRow "9"^^xsd:integer .

RDF Data Generated from Enhancement Conversion

An "enhancement" conversion converts the raw data (in CSV format) into an RDF representation based on a manually-generated configuration file. Below is the content on lines 82-100 (generated by an enhancement conversion) in the RDF dump file. The RDF data encodes the first two records of the data table, corresponding to 8th and 9th rows in the raw data (Excel file).
  • on the first line, the URI of record is the same as the one in raw conversion. This allows incrementally add enhanced descriptions to the existing descriptions.
  • on the second line, a new RDF property "e1:region" has been created in addition to the "raw:region". Note that the range of the two RDF properties are different.
  • on the second line, a new RDF resource "value_of_region:Mid-Atlantic" is promoted from the original literal string in raw data. By assigning the named entity (a region in this case) a unique URI, users can later add more descriptions or links to the entity, e.g. linking to wikipedia:Mid-Atlantic_states, in the future.
  • on the third line, the number "12" is now annotated with a datatype. This not only helps users to better underestand the meaning of data, but also supports triple stores' aggregation functions (e.g. sum) on such data which cannot be used on plain literals.
ds1623:thing_8 dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17> ;
	e1:region value_of_region:Mid-Atlantic ; 
	e1:state value_of_state:District_of_Columbia ;
	e1:fiscal_year_06 "12"^^xsd:integer ;
	e1:fiscal_year_07 "289"^^xsd:integer ;
	e1:fiscal_year_08 "342"^^xsd:integer ;
	e1:fiscal_year_09 "376"^^xsd:integer ;
	e1:total "1019"^^xsd:integer ;
	ov:csvRow "8"^^xsd:integer .

ds1623:thing_9 dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2010-Sept-17> ;
	e1:region value_of_region:Mid-Atlantic ;
	e1:state value_of_state:Maryland ;
	e1:fiscal_year_06 "1029"^^xsd:integer ;
	e1:fiscal_year_07 "3565"^^xsd:integer ;
	e1:fiscal_year_08 "4014"^^xsd:integer ;
	e1:fiscal_year_09 "2403"^^xsd:integer ;
	e1:total "11011"^^xsd:integer ;
	ov:csvRow "9"^^xsd:integer .
 
Syndicate content

Warning: Table './drupal/watchdog' is marked as crashed and last (automatic?) repair failed query: INSERT INTO watchdog (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (0, 'php', '%message in %file on line %line.', 'a:4:{s:6:\"%error\";s:12:\"user warning\";s:8:\"%message\";s:355:\"Table &#039;./drupal/accesslog&#039; is marked as crashed and last (automatic?) repair failed\nquery: INSERT INTO accesslog (title, path, url, hostname, uid, sid, timer, timestamp) values(&#039;TWC LOGD 101 Tutorial Series&#039;, &#039;taxonomy/term/29&#039;, &#039;&#039;, &#039;10.0.1.254&#039;, 0, &#039;gpkh39p4l6cm98f2qooh8f1ak0&#039;, 99, 1575626657)\";s:5:\"%file\";s:58:\"/data/www/html/drupal/modules/statistics/statistics.module\";s:5:\"%line\";i:63;}', 3, '', 'https://logd.tw.rpi.edu/category/keywords/twc_ in /data/www/html/drupal/includes/database.mysqli.inc on line 134