Previous title: "LOGD Instance Hub URI Design: Unique URIs for LOGD instances"
URI Design Goals
These principles should produce:
- URIs that are easily re-hosted. This means that a generated should be easily transformed from one BASE URI-space (e.g.
logd) to another, allowing easier buy-in from government agencies.
- For example, the pattern
http://logd.tw.rpi.edu/id/epa-gov/XXXXXX is easily (syntactically) transformed to http://epa.gov/id/XXXXXX when/if the EPA buys in to this scheme
- Concise URIs with as little "cruft" as possible
- URIs that span many domains including:
- National identifiers (e.g. govermental agencies, states, zip codes)
- State-level identifiers (e.g. counties, congressional districts)
- Agency-level identifiers (e.g. EPA facilities)
URI Design Overview
URI Template:
'http://' BASE '/' 'id' '/' ORG '/' CATEGORY ( '/' TOKEN )+
For case of the
TWC RPI Instance Hub BASE will be:
logd.tw.rpi.edu
Notes
- id
- This is required because we don't want to pollute the top namespace of BASE with identifiers.
- Prefer
id over instance-hub because we want as short a token as possible; the id token doesn't add any semantics, it's just a syntactic way of distinguishing these URIs from others.
- Also, consistency with data.gov.uk URIs here is a good thing.
- ORG
- This is a short token representing the agency, government, or organization that controls the identfier space.
- For US identifiers, this token will start with
us/, and be followed by a designation of either federal or state-level (e.g. us/fed, us/ny, us/ca).
- Identifiers relating to data.gov will all fall under the federal
us/fed' space.
- For identifiers that aren't directly governmental, the ORG token should be suitably unique; for example, we use
usps-com below for USPS controlled zip code URIs.
- CATEGORY and TOKEN
- These are ORG-specific values that identify the specific instance.
- Use as many TOKENs as necessary to distinguish the instance.
Example Identifiers
US Government Agencies
- Owner
- federal
- Suggested
- http://BASE/id/us/fed/agency/NAME/SUBNAME
- Example
- http://BASE/id/us/fed/agency/Commerce/National_Oceanic_and_Atmospheric_Administration
- Example
- http://BASE/id/us/fed/agency/Department_of_Health_and_Human_Services/Centers_for_Disease_Control
States and Territories
- Owner
- federal
- Suggested
- http://BASE/id/us/state/NAME
- Example
- http://logd.tw.rpi.edu/id/us/state/Vermont
- Include
- FIPS code, two-letter code, name, dbpedia/geonames/govtrack sameAs
- Notes
- States and territories are identified by FIPS 5-2 codes, two-letter abbreviations, and names. Not all states/territories have two-letter abbreviations. FIPS 5-2 has been withdrawn as a FIPS standard (2008). Names are probably the most stable.
Counties
- Owner
- federal
- Suggested
- http://BASE/id/us/state/STATE/COUNTY
- Example
- http://BASE/id/us/state/Alaska/Bethel_Census_Area
- Notes
- Just like states, counties are identified by FIPS codes (FIPS 6-4), but these have been withdrawn (2008). Names of counties seem stable, though two states don't refer to them as "counties": Alaska (borough) and Louisiana (parish). Hierarchy built on the state/territory URIs seems like the best design.
Zip codes
- Owner
- USPS
- Suggested
- http://BASE/id/usps-com/zip/CODE
- Example
- http://BASE/id/usps-com/zip/09510
- Include
- code, link to state
- Notes
- The Census Bureau uses ZIP Code Tabulation Areas (ZCTA) based on ZIP codes.
Congressional districts
- Owner
- state
- Suggested
- http://BASE/id/us/STATE/congressional-district/NUMBER
- Example
- http://BASE/id/us/ma/congressional-district/4
- Include
- link to state, dbpedia sameAs
- Notes
- STATE here can be a two-letter code because (at least for present-day districts and non-voting delegations) we only have data for places with two-letter codes: the the 50 states, DC, AS, GU, MP, PR, and VI.
EPA Facilities
- Owner
- EPA
- Suggested
- http://BASE/id/epa-gov/facility/ID
- Example
- http://BASE/id/epa-gov/facility/110007995027
- Include
- link to facility detail report, link to state: http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility?p_registry_id=110007995027
- Notes
- EPA facility IDs are used in EPA Facilities Registry System (FRS) datasets (e.g. Dataset 1008)
References
[1]
legislation.gov.uk URIs
[2]
Creating Linked Data - Part II: Defining URIs
[3]
The Real Deal: data.gov.uk