Invited Talk : Scaling up Semantics

Long title
Using SKOS and data shapes to integrate tabular research data from diverse sources
Starts at
Mon, Oct 21, 2024, 12:00 EDT
Finishes at
Mon, Oct 21, 2024, 13:00 EDT
Venue
Auditorium
Moderator
Alasdair MacDonald

Using SKOS and data shapes to integrate tabular research data from diverse sources

To enable analyses that span multiple sources of tabular data, eg for research in agriculture, researchers can start by associating their variables with URIs for properties from established authorities such as Dublin Core or from the more domain-specific NALT Concept Space of the USDA National Agricultural Library ("reconciliation"). Datasets are then modeled as sets of interrelated "data shapes", each with the properties used for describing specific types of entity, such as "cotton fiber samples". Models are edited in a straightforward spreadsheet format, Dublin Core Tabular Application Profile (DCTAP). The DCTAP serves as the source for automatically generating UML diagrams, HTML documentation pages, and machine-actionable validation schemas (ShEx) and as a target form for extracting data from tabular formats into RDF ("semanticization"). The resulting RDF knowledge graph is validated by ShEx; queried with SPARQL; and used for visualizations. The properties used in data shapes are added to SKOS collections ("property sets") in NALT. These growing property sets serve as starting points for creating future data shapes for future requirements.
  • Tom Baker

    Consultant

    Tom Baker has worked on Semantic Web standards since the 1990s, when he helped organize DCMI, for which he now serves as Technology Director and Usage Board co-chair. Tom co-chaired the W3C working group that published SKOS. After completing a PhD in anthropology at Stanford University in 1989, he worked as a research sociologist in Italy, then as a data researcher at GMD, Fraunhofer, and Göttingen State Library (Germany). He has taught at AIT (Bangkok) and Sungkyunkwan University (Seoul) and has worked on agricultural data projects with FAO, CABI, and (currently) the USDA National Agricultural Library.

Moderator

  • Alasdair MacDonald

    University of Edinburgh

    Alasdair MacDonald is the Metadata and University Collections Facility Manager at Edinburgh University Library, where he has worked since 2014. He is the manger of the Metadata Team, which provides a centralised bibliographic cataloguing service to all Library sites, and also manages the Library's offsite collections store. Alasdair is the current Chair of the DCMI Governing Board and Vice Chair of the CILIP Metadata and Discovery Group Scotland Committee. He has previously held posts at the Bodleian Library and National Library of Scotland.