Semantic Search of the Data-gov Catalog

Description: 
Search the data-gov catalog entries using RDFa and Yahoo! Boss.
Contributor:
Contributor:

Motivating Problem

Keyword based search engines return results based on keyword matches, where result descriptions typically consist of page abstracts with the query keywords highlighted. However, these search result descriptions do not always present useful information for a user to understand the content of the page.

Implementation

Before implementing this search service, we first wrote an RDFa extension for the Semantic MediaWiki (SMW). This RDFa extension of SMW will extract the semantic data of SMW pages, convert this data into RDFa, and embed it within the SMW page. The functionality of this extension is applied to the Data.gov Catalog datasets, and generates RDFa data about the datasets. The search service accesses the RDFa data through queries to our triple store. Upon getting a user query from the web interface, we send the query to the Yahoo! Boss Search Application to get the search results in an XML format. We then parse this XML to fetch the URLs of the results. Using these URLs, we form sparql queries to query the RDFa triples we want. Finally, we present these triples to the user. Read more on How to Build Data-gov Semantic Search to Consume RDFa

Benefit

Using this semantic search application on the Data.gov catalog, we can enhance a user's search experience by given them convenient summary information about their search results (allowing them to more easily find relevant pages).

Technology Highlights

  • We use RDFa extension of SMW to generate RDFa data of data-gov catalog.
  • We use ARC2 to load the RDFa triples to ARC2 triple store.
  • We use Yahoo! Boss Application to search related web contents based on user input.
  • Yahoo! Boss Application returns xml document to our server.
  • We parses the xml results to get URL information, and form sparql queries to query RDFa data against ARC2 triple store.
  • We parse the RDFa data and present enhanced result to the users.
  • Uses Technology: 
    Uses Technology: 
    Uses Technology: 
    Uses Technology: 
    Thumbnail: 
    No votes yet