URI Design Principles: Creating Persistent URIs for Government Linked Data

Version: 23 October 2013

Table of Contents

  1. Goals for Persistent Open Government Data URIs
  2. URI Design Overview
  3. Example Persistent URIs for Government Linked Data
  4. References & Resources

1. Goals for Persistent Open Government Data URIs

As an increasing number of governments and government agencies have begun publishing linked open government data, policies and best practices will emerge for Uniform Resource Identifier (URI) design[0] in open government data release. RPI TWC is working extensively with key players in the United States and international linked open government data initiatives to develop URI schemes that are useful today and in the future.

The URI scheme demonstrated by the TWC Instance Hub provides rich descriptive information about entities for humans examining open government data, while encouraging intuitive navigation and exploration of this data. The implementation of this scheme in the instance hub shows its utility for data publishing, and we hope to see government support and adoption of this scheme in order to further test its viability.

The principles recommended in this document should produce URIs with the following characteristics:
  • URIs that are easily re-hosted. This means that a generated should be easily transformed from one BASE URI-space (e.g. logd) to another, allowing easier adoption by government agencies.
    • For example: the pattern http://logd.tw.rpi.edu/id/epa-gov/XXXXXX can easily (based on syntax) be transformed to http://epa.gov/id/XXXXXX should the agency "EPA" adopt this scheme
  • Concise URIs with as little "cruft" as possible
  • URIs that span many domains including:
    • National identifiers (e.g. governmental agencies, states, zip codes)
    • State-level identifiers (e.g. counties, congressional districts)
    • Agency-level identifiers (e.g. EPA facilities)

2. URI Design Overview

URI Template: ‘http://’ [base] / ‘id’ / ([category]/[token]*)+ /[token]+

For the case of the TWC RPI Instance Hub [V1, V2] the value of base will be: logd.tw.rpi.edu

Notes

  • id
    • The id path is recommended to avoid "polluting" the top namespace [base] with identifiers.
    • id is preferred over e.g. "instance-hub" in order to have as short a token as possible. id doesn't add any semantics, but is a syntactic way to distinguish these URIs from others.
    • This token is consistent with data.gov.uk URI design recommendations
  • category
    • This element indicates the type of an entity
    • Categories allow us to describe an entity and also aid in navigation. If desired, in an implementation, category pages can serve to list all entities falling within their category
    • Category elements should be short, but don’t lose expressivity in favor of shortness
    • URIs ending in a category should not be the URIs for the concept itself.

A tree style representation of URI “categories”, which present listings of the entities that are logically categorized beneath them.
  • token
    • An element which, when qualified by categories and other elements, identifies a unique element
    • All entity URIs should end with a token
    • A token can be followed by a category or another token, if they follow logically
    • Tokens are unique when fully qualified; although a given token might not be unique within an instance hub, it must be unique within the category or token that precedes it.
    • Consider the token “Office_of_Inspector_General”:
      • /id/us/fed/agency/Department_of_Justice/Office_of_Inspector_General
      • /id/us/fed/agency/General_Services_Adminstration/Office_of_Inspector_General
      • /id/us/fed/agency/Department_of_Defense/Office_of_Inspector_General

3. Example Persistent URIs for Government Linked Data

Note: this section is still in review for consistency with the revised recommendations...

3.1. US Government Agencies

Owner
federal
Suggested
http://BASE/id/us/fed/agency/NAME/SUBNAME
Example
http://BASE/id/us/fed/agency/Commerce/National_Oceanic_and_Atmospheric_Administration
Example
http://BASE/id/us/fed/agency/Department_of_Health_and_Human_Services/Centers_for_Disease_Control

3.2. Canadian Federal Agencies [4]

Owner
federal
Suggested
http://BASE/id/ca/fed/agency/NAME/SUBNAME
Example
http://BASE/id/ca/fed/agency/Department_of_Agriculture_and_Agri-Food
http://BASE/id/ca/fed/agency/Agriculture_and_Agri-Food_Canada
Example
http://BASE/id/ca/fed/agency/Department_of_Industry
http://BASE/id/ca/fed/agency/Industry_Canada
Notes:
Both the "Legal Title" and "Applied Title" are shown

3.3. US States and Territories

Owner
federal
Suggested
http://BASE/id/us/state/NAME
Example
http://logd.tw.rpi.edu/id/us/state/Vermont
Include
FIPS code, two-letter code, name, dbpedia/geonames/govtrack sameAs
Notes:
States and territories may be identified by FIPS 5-2 codes, two-letter abbreviations, or full names in published datasets
Not all states/territories have two-letter abbreviations
FIPS 5-2 has been withdrawn as a FIPS standard (2008)
The use of full names is probably the most stable approach

3.4. Canadian Provinces and Territories

Owner
federal
Suggested
http://BASE/id/ca/province/NAME
Example
http://logd.tw.rpi.edu/id/ca/province/Alberta
Include
Two-letter province codes, name, dbpedia/geonames/govtrack sameAs
Notes
Canadian provinces and territories have not been implemented in RPI's Instance Hub (Sep 2013)

3.5. US Counties

Owner
federal
Suggested
http://BASE/id/us/state/STATE/COUNTY
Example
http://BASE/id/us/state/Alaska/Bethel_Census_Area
Notes
Just like US states, counties are identified by FIPS codes (FIPS 6-4), but these have been withdrawn (2008).
US county names seem stable, although two US states do not refer to them as "counties": Alaska (borough) and Louisiana (parish).
Hierarchy based on the state/territory URIs seems like the best design.

3.6. US Zip codes

Owner
USPS
Suggested
http://BASE/id/usps-com/zip/CODE
Example
http://BASE/id/usps-com/zip/09510
Include
code, link to state
Notes
The Census Bureau uses ZIP Code Tabulation Areas (ZCTA) based on ZIP codes.

3.7. US Congressional districts

Owner
state
Suggested
http://BASE/id/us/STATE/congressional-district/NUMBER
Example
http://BASE/id/us/ma/congressional-district/4
Include
link to state, dbpedia sameAs
Notes
STATE here can be a two-letter code because (at least for present-day districts and non-voting delegations) we only have data for places with two-letter codes: the the 50 states, DC, AS, GU, MP, PR, and VI.

3.8. US Agencies: EPA Facilities

Owner
EPA
Suggested
http://BASE/id/epa-gov/facility/ID
Example
http://BASE/id/epa-gov/facility/110007995027
Include
link to facility detail report, link to state: http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility?p_registry_id=110007995027
Notes
EPA facility IDs are used in EPA Facilities Registry System (FRS) datasets (e.g. Dataset 1008)

4. References & Resources

[0] Uniform Resource Identifiers (URI): Generic Syntax
[1] legislation.gov.uk URIs
[2] Creating Linked Data - Part II: Defining URIs
[3] The Real Deal: data.gov.uk
[4] Treasury Board of Canada: Registry of Applied Titles
[5] See also: Phil Archer, Study on Persistent URIs, with identification of best practices and recommendations on the topic for the Member States and the European Commission (2012). Also available as PDF
[6] See also: Hans Overbeek and Linda van den Brink, Towards a national URI-Strategy for Linked Data of the Dutch public sector (2013)