Semantic Search of the Data-gov Catalog

Search the data-gov catalog entries using RDFa and Yahoo! Boss.

Motivating Problem

Keyword based search engines return results based on keyword matches, where result descriptions typically consist of page abstracts with the query keywords highlighted. However, these search result descriptions do not always present useful information for a user to understand the content of the page.


Before implementing this search service, we first wrote an RDFa extension for the Semantic MediaWiki (SMW). This RDFa extension of SMW will extract the semantic data of SMW pages, convert this data into RDFa, and embed it within the SMW page. The functionality of this extension is applied to the Catalog datasets, and generates RDFa data about the datasets. The search service accesses the RDFa data through queries to our triple store. Upon getting a user query from the web interface, we send the query to the Yahoo! Boss Search Application to get the search results in an XML format. We then parse this XML to fetch the URLs of the results. Using these URLs, we form sparql queries to query the RDFa triples we want. Finally, we present these triples to the user. Read more on How to Build Data-gov Semantic Search to Consume RDFa


Using this semantic search application on the catalog, we can enhance a user's search experience by given them convenient summary information about their search results (allowing them to more easily find relevant pages).

Technology Highlights

  • We use RDFa extension of SMW to generate RDFa data of data-gov catalog.
  • We use ARC2 to load the RDFa triples to ARC2 triple store.
  • We use Yahoo! Boss Application to search related web contents based on user input.
  • Yahoo! Boss Application returns xml document to our server.
  • We parses the xml results to get URL information, and form sparql queries to query RDFa data against ARC2 triple store.
  • We parse the RDFa data and present enhanced result to the users.
  • Uses Technology: 
    Uses Technology: 
    Uses Technology: 
    Uses Technology: 
    No votes yet

Warning: Table './drupal/watchdog' is marked as crashed and last (automatic?) repair failed query: INSERT INTO watchdog (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (0, 'php', '%message in %file on line %line.', 'a:4:{s:6:\"%error\";s:12:\"user warning\";s:8:\"%message\";s:360:\"Table './drupal/accesslog' is marked as crashed and last (automatic?) repair failed\nquery: INSERT INTO accesslog (title, path, url, hostname, uid, sid, timer, timestamp) values('Semantic Search of the Data-gov Catalog', 'node/62', '', '', 0, 'hhtbrquvfgduusihu3vd04sh91', 32, 1618681399)\";s:5:\"%file\";s:58:\"/data/www/html/drupal/modules/statistics/statistics.module\";s:5:\"%line\";i:63;}', 3, '', ' in /data/www/html/drupal/includes/ on line 134