Project Page: Instance Hub

Original draft based on Dominic's email, 7 Aug 2011 This is an update on where Instance Hub currently is, what still needs to be done, and what the plans are for future versions. Also listed are tasks that need to be accomplished before we have a first version out the door.

Data:

What is done:

  • Raw dataset for 5 categories are done (US States, US Agencies, Fiscal Years, Crops, Toxic Chemicals) all in a google docs.
  • Three of which have been converted (US States, US Agencies, Fiscal Years) in a first pass.

What needs to be done:

  • Need to create URI design for Crops and Toxic Chemicals, then convert these two new categories.
  • Already have a small ontology to describe instance hub and the different categories, must use this is next version of conversion of categories
  • Continue adding metadata and instances to categories we already have, continue to add new alternative names to instance to help with linking data
  • Link Instance Hub to other Linked Open Datasets
  • Start using Instance Hub in all new LOGD datasets instead of our old LOD Link Files (Instance Hub is currently not directly referenced in any LOGD Dataset)
  • Convert implicit links (DBPedia links) in LOGD Datasets to Instance Hub URIs
  • Create new Instance Hub Categories
  • Create 2nd version of Instance Hub data when Converter feature is added in helping with URI creation and when minor bugs are fixed.

Interface:

What is done:

  • All converted categories have a page that lists all instances in those categories
  • All instance in converted categories have a page that describes them. The instance hub URI redirects to this page (with content negotiation) For example http://logd.tw.rpi.edu/id/us/fed/agency/Department_of_Defense redirects to http://logd.tw.rpi.edu/id/us/fed/agency_page/Department_of_Defense in a web browser. When HTML is asked, it is given. When RDF is asked for, it is also given. This allows the URIs to work with linked data browsers.
  • Basic skeleton of Instance main page at http://logd.tw.rpi.edu/id_page
  • First Draft of Instance Hub URI design page done.

What needs to be done:

  • Instance Hub category pages need to be cleaned up, with some explanation text and what that category is about.
  • The category page for Fiscal years is very bad at the moment. Need to find a way to bring in country name in title to help clean it up. See http://logd.tw.rpi.edu/id/fiscal-year_page
  • Instance Hub main page needs to be filled out and cleaned up.
    • Need intro text explaining basic idea of Instance Hub
    • Need documentation page about Instance Hub and how it can/should be used
      • This documentation must explain how it can be used by our converter to link data together
      • This should also be the place where we explain the basic use cases of Instance Hub and explain why other developers may wish to use it
  • Provide unzipped version of RDF dump of Instance Hub categories so they can be used by our Converter as a LOD Link File
  • Provide redirection of http://logd.tw.rpi.edu/id to http://logd.tw.rpi.edu/id_page or move content to http://logd.tw.rpi.edu/id
  • Add Instance Hub to Main Links Menu in LOGD Portal. (Only to be done when we decide to deploy)
  • Create a search interface for Instance Hub to make it easier to find an instance or category in Instance Hub.

Process:

What is done:

  • Basic process of using google docs to manage raw data for Instance Hub categories, use script to download from google's servers to sam.tw.rpi.edu, use Converter to convert raw data into RDF using enhancement files made for each Instance Hub category, then publish to web and load to triple store. All scripts and enhancement files are under SVN.
  • Create internal document that outlines and completes this process better. Allow this to be a first step in getting other students to help with task of creating and maintaining Instance Hub categories.
    • need to figure out how this process will work when more people are helping with the task of creating and maintaining categories. Look into international catalog group to take process ideas in that project. Will students only be inputting data in google docs? Etc
    • Add process by which new alternative names for instances (from LOGD Datasets) can be easily added. For example: We find a LOGD dataset that referrers to New York State as (NY1). We add this to the New York State instance in Instance Hub, so if we ever come across that string literal for New York State, we can instantly link it to Instance Hub.
    • Create process to help go through back catalog in LOGD, and link to Instance Hub.
    • Add Instance Hub to our current LOGD Conversion process. This way new datasets are linked to Instance Hub.
These are the basic points I've come across so far, I'm sure this will be added to over time. Any help in addressing what is missing and what is not important would be very helpful.

What I need is:

  • Comments on these plans and comments on what we already have.
  • If you can look through the current interface and let me know what should be added, what things should look like, etc.
  • Also help in prioritization of these tasks and plans is really needed.
  • I also need help in deciding which of these plans are in scope for a first version of Instance Hub, and which should be held back for future versions would also be helpful.
Thanks to everyone for their help in Instance Hub. We have gotten a lot done in getting things started, but we still have a lot of work to do.

Warning: Table './drupal/watchdog' is marked as crashed and last (automatic?) repair failed query: INSERT INTO watchdog (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (0, 'php', '%message in %file on line %line.', 'a:4:{s:6:\"%error\";s:12:\"user warning\";s:8:\"%message\";s:347:\"Table './drupal/accesslog' is marked as crashed and last (automatic?) repair failed\nquery: INSERT INTO accesslog (title, path, url, hostname, uid, sid, timer, timestamp) values('Project Page: Instance Hub', 'node/11047', '', '10.0.1.254', 0, '8kabds9ft2ec61re1ks3mtjcs4', 60, 1597520861)\";s:5:\"%file\";s:58:\"/data/www/html/drupal/modules/statistics/statistics.module\";s:5:\"%line\";i:63;}', 3, '', 'https://logd.tw.rpi.edu/project/project_page_instance_ in /data/www/html/drupal/includes/database.mysqli.inc on line 134