Congratulations! TWC LOGD won the second prize (CNY 5,000) in the Semantic Web Challenge 2010!
1. Does the TWC LOGD Portal meet the SWC2010 “Minimum
1.1 “End User Application”
The TWC LOGD Portal is a comprehensive semantic web application dedicated to publishing Linked Data versions of open government data for end users and to the sharing of tools, services and expertise supporting an open government data ecosystem. The Portal serves a wide range of consumers and producers ranging from informed citizens and domain experts to developers of novel applications and web sites that will be enriched by government data. The TWC LOGD Portal remains an essential resource for government web site developers working to publish open data using Semantic Web technologies.
- The TWC LOGD Portal landing page combines data from multiple sources in order to demonstrate the use of Semantic Web principles to create a dynamic micro-app that is useful in its own right. How the landing page works:
- VoiD metadata from converted government datasets is stored in the TWC LOGD "Data" triple store
- Metadata from demos and tutorials is published via RDFa-annotated XHTML pages and synchronized with the "LOD Cache"
- Relevant news items are maintained in the Data-gov site
- Several RSS feeds are published at the Data-Gov and LOGD sites
- The content panels on the TWC LOGD Portal front page are based on live SPARQL queries across the site data using XSLT and the Google Ajax API
- As of late September 2010 more than 8.5 billion RDF triples have been made accessible to end users through the TWC LOGD Portal, from 436 RDFized datasets published by 11 different data sources; the majority of these are from Data.gov.
- The 40+ embedded, live mashups and visualizations featured on the TWC LOGD portal are themselves useful end-user applications, created to demonstrate replicable coding practices for consuming government-related data in the Web of Data.
- Developer-users of the TWC LOGD Portal demos can find summaries describing the datasets, technologies and queries used; comprehensive tutorials clarify the methods.
- The TWC LOGD Portal and its predecessor have seen more than 400K page visits from 134 countries and 4K cities since going live in 2009.
1.2.a. “diverse ownership or control”
- data published through the TWC LOGD Portal is aggregated data from a variety of government and non-government sources, each with its own degree of authority and trustworthiness.
- The owners/controllers of the Open Government Data include numerous U.S. government agencies.
- We also integrate data from other social entities (e.g. DBpedia, New York Times, Twitter, Google Search).
1.2.b. “heterogeneous sources”
- Our demos are diverse mashups and visualizations, including demonstrating the integration of data from
multiple sources including DBpedia, the New York Times API, and open government
data produced by non-US sources. For example, the "CASTNET Ozone Map" mashes up data sources from Data.gov and epa.gov.
- TWC LOGD landing page is itself a mashup of data from multiple sources.
As shown below, the content panels on the TWC LOGD front page are based
on live SPARQL queries across the site data using XSLT and the Google Ajax API.
The sources include metadata of the TWC LOGD datasets stored at the TWC LOGD
Data triple store; metadata of demos and tutorials published via RDFa-annotated
XHTML pages that are synchronized with the LOD Cache; relevant news items
maintained in the Data-Gov website;
and RSS feeds from the TWC LOGD Portal, Data-Gov’s SemDiff Service and the
Google News site.
- Our converter also RDFized raw data in CSV, XML and Fixed-width
- See Section 3 for more details.
1.2.c. “real world data”
- The data we integrate are a good coverage of real world data. They are from a variety of government agencies as well as from diversified non-government social entities including news, locations, peoples, events etc.
- Therefore, TWC LOGD has been included in the Sept 2010 version of the Linked
Open Data Cloud.
1.3.a. “using Semantic Web technologies”
The TWC LOGD Portal enables a Semantic Web-based
Conversion/Creation: Most government datasets are released in “raw” or
unstructured formats. The TWC LOGD Portal converts these “raw datasets” to RDF
using the TWC LOGD converter. The dataset versioning mechanism is used with a
semantic diff (SemDiff) service to compute changes in the Data.gov dataset catalog. Semantic-capable content management systems, such
as Semantic MediaWiki and Drupal with RDF modules, have been used to preserve
user-generated metadata in normal content publishing activities using
- Data Query/Access: The converted datasets may be accessed through the TWC LOGD Portal in
many ways. Each dataset has a summary web page that aggregates
manually-contributed metadata (e.g. title, description, agency) and
automatically-generated metadata (e.g. number of triples, links to data dumps).
The metadata of datasets can also accessed by dereferencing URIs (Linked Data
principle). In order to support users in accessing the datasets via query,
there is also a publicly-accessible SPARQL endpoint hosting a selection of datasets and the automatically recorded metadata of the
TWC LOGD datasets. A TWC-developed tool - SparqlProxy is used to enhance our SPARQL endpoint to return results in a richer set of
formats such as JSON and HTML table. A TWC-developed tool - LOD cache is used to synchronize RDF data published via RDFa-annotated web pages
throughout the Portal.
- Education and Community Portal:
RDFa on Drupal is used to publish both human- and
machine-readable metadata embedded in TWC LOGD demos and tutorial pages; this approach
hides the details of RDF creation from end users and enables a SPARQL-queryable
site with novel features including the dynamically-generated list of demos on
the TWC LOGD front page.
1.3.b. “data manipulated/processed in interesting ways”
- We leverage RDFa to preserve structure metadata in a content management
system and then support a queryable Web using a LOD cache.
- We also provide
powerful data enhancement functions, e.g. promoting literal strings into
DBpedia URI, and enable cell-based conversion for multi-dimension data tables.
- Our mashups help multiple dataset analysis across domains and times so as to
reveal hidden facts (or stimulate hypotheses) which are impossible within a
A team graduate and undergraduate students have
created over 40 different mashups and visualizations on the TWC LOGD Portal.
These mashups are diverse, including demonstrating the integration of data from
multiple sources including DBpedia, the New York Times API, and open government
data produced by non-US sources (see a-g); deploying data via web and
mobile interfaces (see a); supporting interactive analysis for
specific domains including health, policy and financial data (see e-g); consuming integrated data using readily-available Web-based services (see h,i); and designing data access tools (see j) and semantic data
integration tools (see k)
1.3.c. “central role in achieving things that alternative
technologies cannot do as well.
- Data Integration: Data integration is vital to the objectives of OGD.
Linking and integrating data in novel ways help consumers uncover new patterns
and correlations and create new knowledge. Linked Data principles and semantic
web technologies make it easy to connect heterogeneous datasets without advance
coordination or planning.
Fast: The TWC LOGD Portal demonstrates how
Linked Data principles and semantic web technologies may be applied to decrease
development costs and increase the reuse of data models, links and
visualization techniques. It advocates a bottom-up approach, encouraging
developers to collaboratively model data, define terms, link terms and concepts
to other heterogeneous datasets, and to use generic visualization libraries and
APIs to more quickly get useful applications running. For example, the “CASTNET
Ozone Map” was created within two weeks and iteratively
enhanced over the span of a month. Similarly, in September 2010 four unique
demos were created to support Tobacco Prevalence study in the NIH project
- Low Cost: Developers don't need to be expert in semantic technologies or Linked Data principles to create semantically-enabled LOGD mashups. Undergraduate students in RPI’s Fall 2009 Web Science class created mashups using semantic technologies and datasets found on the TWC LOGD Portal. Given a two-hour introduction to basics like RDF and SPARQL and patterns for using visualization tools like the Google Visualization API, each group created visualizations mashing at least two converted datasets in less than two weeks. Similarly, in August 2010 the US Data.gov project hosted a Mash-a-thon workshop, organized in part by TWC to engage government developers and data curators in hands-on learning using TWC LOGD tools and datasets. In just two days four teams successfully built LOGD-based mashups, demonstrating the low cost of knowledge transfer and the rapid learning process inherent in the best practices embodied by the TWC LOGD Portal.
2. What “Additional Desirable Features” does TWC LOGD
The TWC LOGD Portal extends Drupal 6 with semantic
technologies to create a flexible, scalable collaboration environment with an
attractive and functional Web interface. Portal content is dynamically
presented using a combination of Drupal, XSLT, SPARQL and RDFa. SPARQL
is also used to query external data and present results to users. Most Portal
pages have “Like” (enabled by the Open Graph Protocol and RDFa) and “Rate”
buttons, enabling end users to give feedback.
The scalability of the TWC LOGD
infrastructure is demonstrated by the large and diverse datasets being actively
converted and published on a daily basis. TWC LOGD mashups also exhibit how distributed
services and data sources may be integrated using semantic technologies. Our
open source code presents best practices for combining dynamic and static data
in scalable real-time visualizations.
In addition to interactive demos of data mashups, the TWC
LOGD Portal hosts multimedia documents including videos and
publications to help stakeholders understand demos and tutorials.
The unique work represented by the TWC LOGD Portal has been rigorously
in the US Government and discussed in the White House blog, commending its use
of Data.gov datasets in innovative ways to generate practical applications and
mashups. Details of TWC’s role and the impact of applying semantic web
technologies can be found in Sections 3.1 and 3.2.
TWC is actively engaged with organizations inside and
outside of government, and this enables us to receive and act on feedback
concerning our data conversion, tools, and applications. Section 3.3
details this interaction. Diagrams as above illustrate the TWC LOGD Portal's data workflow. Section 3.3
shows additional work on data and provenance that goes beyond pure
The TWC LOGD Portal uses several approaches to ensure the accuracy
by improving the quality of converted data. Evolving heuristics based on
statistical analysis are used to connect entities with the same meaning across
massive data, such as linking datasets based on US States.
The TWC LOGD Portal currently acts like a research-oriented,
US-based site. It provides diversified accessibility including
the “White House Visitor” demo on iPad, submitted into iTunes store for
distribution. No alternate language versions of our content or links to
translation services are currently available.