URI Design Principles: Creating Unique URIs for Government Linked Data

Previous title: "LOGD Instance Hub URI Design: Unique URIs for LOGD instances"

URI Design Goals

These principles should produce:
  • URIs that are easily re-hosted. This means that a generated should be easily transformed from one BASE URI-space (e.g. logd) to another, allowing easier buy-in from government agencies.
    • For example, the pattern http://logd.tw.rpi.edu/id/epa-gov/XXXXXX is easily (syntactically) transformed to http://epa.gov/id/XXXXXX when/if the EPA buys in to this scheme
  • Concise URIs with as little "cruft" as possible
  • URIs that span many domains including:
    • National identifiers (e.g. govermental agencies, states, zip codes)
    • State-level identifiers (e.g. counties, congressional districts)
    • Agency-level identifiers (e.g. EPA facilities)

URI Design Overview

URI Template: 'http://' BASE '/' 'id' '/' ORG '/' CATEGORY ( '/' TOKEN )+
For case of the TWC RPI Instance Hub BASE will be: logd.tw.rpi.edu

Notes

  • id
    • This is required because we don't want to pollute the top namespace of BASE with identifiers.
    • Prefer id over instance-hub because we want as short a token as possible; the id token doesn't add any semantics, it's just a syntactic way of distinguishing these URIs from others.
    • Also, consistency with data.gov.uk URIs here is a good thing.
  • ORG
    • This is a short token representing the agency, government, or organization that controls the identfier space.
    • For US identifiers, this token will start with us/, and be followed by a designation of either federal or state-level (e.g. us/fed, us/ny, us/ca).
    • Identifiers relating to data.gov will all fall under the federal us/fed' space.
    • For identifiers that aren't directly governmental, the ORG token should be suitably unique; for example, we use usps-com below for USPS controlled zip code URIs.
  • CATEGORY and TOKEN
    • These are ORG-specific values that identify the specific instance.
    • Use as many TOKENs as necessary to distinguish the instance.

Example Identifiers

US Government Agencies

Owner
federal
Suggested
http://BASE/id/us/fed/agency/NAME/SUBNAME
Example
http://BASE/id/us/fed/agency/Commerce/National_Oceanic_and_Atmospheric_Administration
Example
http://BASE/id/us/fed/agency/Department_of_Health_and_Human_Services/Centers_for_Disease_Control

States and Territories

Owner
federal
Suggested
http://BASE/id/us/state/NAME
Example
http://logd.tw.rpi.edu/id/us/state/Vermont
Include
FIPS code, two-letter code, name, dbpedia/geonames/govtrack sameAs
Notes
States and territories are identified by FIPS 5-2 codes, two-letter abbreviations, and names. Not all states/territories have two-letter abbreviations. FIPS 5-2 has been withdrawn as a FIPS standard (2008). Names are probably the most stable.

Counties

Owner
federal
Suggested
http://BASE/id/us/state/STATE/COUNTY
Example
http://BASE/id/us/state/Alaska/Bethel_Census_Area
Notes
Just like states, counties are identified by FIPS codes (FIPS 6-4), but these have been withdrawn (2008). Names of counties seem stable, though two states don't refer to them as "counties": Alaska (borough) and Louisiana (parish). Hierarchy built on the state/territory URIs seems like the best design.

Zip codes

Owner
USPS
Suggested
http://BASE/id/usps-com/zip/CODE
Example
http://BASE/id/usps-com/zip/09510
Include
code, link to state
Notes
The Census Bureau uses ZIP Code Tabulation Areas (ZCTA) based on ZIP codes.

Congressional districts

Owner
state
Suggested
http://BASE/id/us/STATE/congressional-district/NUMBER
Example
http://BASE/id/us/ma/congressional-district/4
Include
link to state, dbpedia sameAs
Notes
STATE here can be a two-letter code because (at least for present-day districts and non-voting delegations) we only have data for places with two-letter codes: the the 50 states, DC, AS, GU, MP, PR, and VI.

EPA Facilities

Owner
EPA
Suggested
http://BASE/id/epa-gov/facility/ID
Example
http://BASE/id/epa-gov/facility/110007995027
Include
link to facility detail report, link to state: http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility?p_registry_id=110007995027
Notes
EPA facility IDs are used in EPA Facilities Registry System (FRS) datasets (e.g. Dataset 1008)

References

[1] legislation.gov.uk URIs
[2] Creating Linked Data - Part II: Defining URIs
[3] The Real Deal: data.gov.uk
AttachmentSize
w3c_gld_uri_construction_25jan12.pdf114.5 KB