Summary
Given a catalog, metadata for each dataset is extracted and saved in a CSV file.
csv2rdf4lod automation is then used to convert the CSV file into RDF. Four outputs are publicly accessible through
https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/ for each catalog:
Workflow for getting metadata
- Select a catalog
- List all datasets in the catalog and collect metadata for each dataset in a big table; save the table in CSV format
- Some catalogs provide a data dump, such as data.gov.uk; use the provided dump files
- For catalogs that do not provide a data dump, custom programs must be used (e.g. in Java, Python or otherwise) to extract the metadata
- Commit the CSV files and any corresponding source code to https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/
Workflow for publishing a dataset catalog
- Follow csv2rdf4lod conversion/enhancement process to convert the csv files into rdf, see details in https://github.com/timrdf/csv2rdf4lod-automation/wiki/Conversion-process-phases
- Run the converter to generate the default enhancement configuration
- Edit the enhancement configuration files to map the original metadata into designed metadata for LOGDC
- Re-run the converter to generate the rdf graph
- Commit the enhancement configuration files and conversion triggers to https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/