Version: 10 March 2014 (DRAFT)
Table of Contents
- Revision Notes
- Goals for Persistent Open Government Data URIs
- URI Design Overview
- Example Persistent URIs for Government Linked Data
- References & Resources
0. Revision Notes
This
DRAFT version eliminates the
/fed/agency
notation from the
Oct 2013 version.
1. Goals for Persistent Open Government Data URIs
As an increasing number of governments and government agencies have begun publishing linked open government data, policies and best practices will emerge for
Uniform Resource Identifier (URI) design[0] in open government data release. RPI TWC is working extensively with key players in the United States and international linked open government data initiatives to develop URI schemes that are useful today and in the future.
The URI scheme demonstrated by the TWC
Instance Hub provides rich descriptive information about entities for humans examining open government data, while encouraging intuitive navigation and exploration of this data. The implementation of this scheme in the instance hub shows its utility for data publishing, and we hope to see government support and adoption of this scheme in order to further test its viability.
The principles recommended in this document should produce URIs with the following characteristics:
- URIs that are easily re-hosted. This means that a generated should be easily transformed from one BASE URI-space (e.g.
logd
) to another, allowing easier adoption by government agencies.
- For example: the pattern
http://logd.tw.rpi.edu/id/epa-gov/XXXXXX
can easily (based on syntax) be transformed to http://epa.gov/id/XXXXXX
should the agency "EPA" adopt this scheme
- Concise URIs with as little "cruft" as possible
- URIs that span many domains including:
- National identifiers (e.g. governmental agencies, states, zip codes)
- State-level identifiers (e.g. counties, congressional districts)
- Agency-level identifiers (e.g. EPA facilities)
2. URI Design Overview
URI Template: ‘http://’ [base] / ‘id’ / ([category]/[token]*)+ /[token]+
For the case of the
TWC RPI Instance Hub [
V1,
V2] the value of
base will be:
logd.tw.rpi.edu
Notes
- id
- The id path is recommended to avoid "polluting" the top namespace [base] with identifiers.
- id is preferred over e.g. "instance-hub" in order to have as short a token as possible. id doesn't add any semantics, but is a syntactic way to distinguish these URIs from others.
- This token is consistent with data.gov.uk URI design recommendations
- category
- This element indicates the type of an entity
- Categories allow us to describe an entity and also aid in navigation. If desired, in an implementation, category pages can serve to list all entities falling within their category
- Category elements should be short, but don’t lose expressiveness in favor of shortness
- URIs ending in a category should not be the URIs for the concept itself.
A tree style representation of URI “categories”, which present listings of the entities that are logically categorized beneath them.
- token
- An element which, when qualified by categories and other elements, identifies a unique element
- All entity URIs should end with a token
- A token can be followed by a category or another token, if they follow logically
- Tokens are unique when fully qualified; although a given token might not be unique within an instance hub, it must be unique within the category or token that precedes it.
- Consider the token “Office_of_Inspector_General”:
- /id/us/Department_of_Justice/Office_of_Inspector_General
- /id/us/General_Services_Adminstration/Office_of_Inspector_General
- /id/us/Department_of_Defense/Office_of_Inspector_General
3. Example Persistent URIs for Government Linked Data
Note: this section is still in review for consistency with the revised recommendations...
3.1. US Government Agencies
- Owner
- federal
- Suggested
- http://BASE/id/us/NAME/SUBNAME
- Example
- http://BASE/id/us/Commerce/National_Oceanic_and_Atmospheric_Administration
- Example
- http://BASE/id/us/Department_of_Health_and_Human_Services/Centers_for_Disease_Control
3.2. Canadian Federal Agencies [4]
- Owner
- federal
- Suggested
- http://BASE/id/ca/NAME/SUBNAME
- Example
- http://BASE/id/ca/Department_of_Agriculture_and_Agri-Food
- http://BASE/id/ca/Agriculture_and_Agri-Food_Canada
- Example
- http://BASE/id/ca/Department_of_Industry
- http://BASE/id/ca/Industry_Canada
- Notes:
- Both the "Legal Title" and "Applied Title" are shown
3.3. US States and Territories
- Owner
- federal
- Suggested
- http://BASE/id/us/state/NAME
- Example
- http://logd.tw.rpi.edu/id/us/state/Vermont
- Include
- FIPS code, two-letter code, name, dbpedia/geonames/govtrack sameAs
- Notes:
- States and territories may be identified by FIPS 5-2 codes, two-letter abbreviations, or full names in published datasets
- Not all states/territories have two-letter abbreviations
- FIPS 5-2 has been withdrawn as a FIPS standard (2008)
- The use of full names is probably the most stable approach
3.4. Canadian Provinces and Territories
- Owner
- federal
- Suggested
- http://BASE/id/ca/province/NAME
- Example
- http://logd.tw.rpi.edu/id/ca/province/Alberta
- Include
- Two-letter province codes, name, dbpedia/geonames/govtrack sameAs
- Notes
- Canadian provinces and territories have not been implemented in RPI's Instance Hub (Sep 2013)
3.5. US Counties
- Owner
- federal
- Suggested
- http://BASE/id/us/state/STATE/COUNTY
- Example
- http://BASE/id/us/state/Alaska/Bethel_Census_Area
- Notes
- Just like US states, counties are identified by FIPS codes (FIPS 6-4), but these have been withdrawn (2008).
- US county names seem stable, although two US states do not refer to them as "counties": Alaska (borough) and Louisiana (parish).
- Hierarchy based on the state/territory URIs seems like the best design.
3.6. US Zip codes
- Owner
- USPS
- Suggested
- http://BASE/id/usps-com/zip/CODE
- Example
- http://BASE/id/usps-com/zip/09510
- Include
- code, link to state
- Notes
- The Census Bureau uses ZIP Code Tabulation Areas (ZCTA) based on ZIP codes.
3.7. US Congressional districts
- Owner
- state
- Suggested
- http://BASE/id/us/STATE/congressional-district/NUMBER
- Example
- http://BASE/id/us/ma/congressional-district/4
- Include
- link to state, dbpedia sameAs
- Notes
- STATE here can be a two-letter code because (at least for present-day districts and non-voting delegations) we only have data for places with two-letter codes: the the 50 states, DC, AS, GU, MP, PR, and VI.
3.8. US Agencies: EPA Facilities
- Owner
- EPA
- Suggested
- http://BASE/id/epa-gov/facility/ID
- Example
- http://BASE/id/epa-gov/facility/110007995027
- Include
- link to facility detail report, link to state: http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility?p_registry_id=110007995027
- Notes
- EPA facility IDs are used in EPA Facilities Registry System (FRS) datasets (e.g. Dataset 1008)
4. References & Resources
[0]
Uniform Resource Identifiers (URI): Generic Syntax
[1]
legislation.gov.uk URIs
[2]
Creating Linked Data - Part II: Defining URIs
[3]
The Real Deal: data.gov.uk
[4]
Treasury Board of Canada: Registry of Applied Titles
[5] See also: Phil Archer,
Study on Persistent URIs, with identification of best practices and recommendations on the topic for the Member States and the European Commission (2012). Also available as
PDF
[6] See also: Hans Overbeek and Linda van den Brink,
Towards a national URI-Strategy for Linked Data of the Dutch public sector (2013)