Papers 1: Metadata and Scholarly Communications

Starts at: Mon, Nov 6, 2023, 14:30 South Korea Time; ( 06 Nov 23 05:30 UTC )
Finishes at: Mon, Nov 6, 2023, 16:00 South Korea Time; ( 06 Nov 23 07:00 UTC )
Venue: Room 201
Moderator: Alasdair MacDonald

Moderator

Alasdair MacDonald

University of Edinburgh
ORCID
Alasdair MacDonald is the Metadata and University Collections Facility Manager at Edinburgh University Library, where he has worked since 2014. The work of the Metadata Team is mainly focussed on the RDA and MARC21 standards, with recent initiatives including the production of metadata for the University's digitised thesis collections and addressing heritage collections backlogs. Alasdair's previous role was Head of Bibliographic Maintenance and Authority Control at the Bodleian Library, Oxford University. Alasdair is the vice Chair of the CILIP Metadata and Discovery Group Scotland committee and the current Chair of the DCMI Governing Board.

Presentations

Research Data Description From Structures Within and Around Metadata

Authors: Julianna Pakstis, Christiana Dobrzynski, Perry Evans, Stephanie Huang, Ene Belleh, Allison Olsen, and Hannah Calkins.

The Arcus Archives at the Children’s Hospital of Philadelphia (CHOP) aims to collect and describe research data from across the Research Institute. To accomplish this, the Arcus Library Science team has established processes, structures, and descriptive metadata schemas informed by all phases of the research data preservation and reuse lifecycle at CHOP including archiving, cataloging, display, and usability. This paper introduces the Arcus Archives metadata schema and the key structures it relies upon.

Specifically, this paper explains how a small set of dataTypes subfields within a hierarchical, archivally informed metadata structure utilize a shared file directory organization structure to thoroughly and accurately process, catalog, and surface for discovery metadata about petabytes of archived pediatric research data from multiple and growing data modalities.

The metadata schema itself is flexible but consistent enough to apply to a myriad of data types produced during the conduct of pediatric research. Each metadata record reflects the unique archival arrangement done for each collection. It encompasses the framework of a shared recommended “project template” file directory structure which includes manifest and protocol files that allow for meaningful capture and organization of numerous complex data files and their relationships to one another.

Julianna Pakstis

Children's Hospital of Philadelphia
ORCID

LinkedIn
Julianna Pakstis works as part of the Arcus initiative to describe archived research data from across the Children's Hospital of Philadelphia (CHOP). She designs and produces metadata records and schemas for collections in the Arcus Archives. She oversees the development and upkeep of the team’s custom tool for interacting with those metadata files.
Julie is interested in emerging standards for research data and innovative techniques for efficient, accurate, and automated metadata application.

Sustainable Scholarly Publishing: Insights and Lessons from DCPapers

The Dublin Core Conferences, organised by the Dublin Core metadata initiative, have a notable history that stretches across two distinct decades. These conferences, characterised by their robust discourse and thought-provoking discussions on metadata, have been integral in disseminating knowledge through their open-access publications. The proceedings from these conferences have been compiled and published as DCPapers through an open-access publication platform known as DCPapers. With nearly 20 volumes spanning various themes, including presentations, papers, posters, project reports, and more, DCPapers has emerged as a comprehensive resource for researchers and practitioners alike. However, maintaining such an extensive resource over an extended period is arduous. This is further complicated by the fast-paced evolution of web technologies witnessed over the past two decades of open-access publication. Despite the technological shifts, DCPapers has relied heavily on a single platform for its operations. It has now reached a critical juncture that necessitates a comprehensive redesign from the ground up. This paper details the experiences of the authors in this endeavour, chronicling their journey in rebuilding and introducing the new and improved DCPapers publishing platform. It offers valuable insights and lessons gleaned from this transformative process, setting the stage for further innovations in open-access publication.

Nishad Thalhath

University of Tsukuba
ORCID

Twitter
Nishad Thalhath is a doctoral candidate in Information Science and a member of the Metadata Laboratory at the School of Library, Media and Information Studies, University of Tsukuba, Japan. His research interests include metadata standards, knowledge graphs, and (meta)data interoperability. For around two decades, he has worked as a developer, engineer and consultant in various IT and ITES projects. He currently works as a part-time researcher in the Laboratory for Large-Scale Biomedical Data Technology, RIKEN Center for Integrative Medical Sciences, Japan, where he develops and manages the metadata and integration systems for omics data.

Research on the Method of Linking Scientific Data and Literature Data through Metadata Fusion and Ontology Construction ——from the Perspective of Agricultural Science and Technology Management in China

Authors: Chai Miaolling

This study proposes a metadata-ontology fusion method from the perspective of agricultural science and technology management in China, which aims to provide a method and case for cross-departmental and cross-domain scientific data sharing and fusion in China's agricultural science and technology management. Firstly, the study presents the connotation and methods of the correlation between scientific data and Science and Technology literature （S&T literature）, and analyzes unstructured data. Secondly, the data characteristics of agricultural science and technology management are explored. Thirdly, the ontology of agricultural science and technology management is constructed to support the integration of multi-source heterogeneous data. In the empirical part, the data requirements of agricultural science and technology management in Sichuan are targeted, and the industrial chain and data chain are integrated to propose 20 data requirements. Finally, the ontology is established and revised, and the linkage between scientific data and S&T literature is realized on a demonstration platform. The ontology of agricultural industry management is realized, and the correlation and fusion of multi-language (Chinese/English) data and unstructured data are achieved. The study builds two demonstration platforms, integrates 24,200 data, including 2,119 expert data, and realizes the correlation of 16 types of agricultural science data and 4 types of S&T literature, verifying the feasibility of the ontology.

Chai Miaoling

Chengdu Library and Information Center, Chinese Academy of Sciences; Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences
ORCID
Female, Master in Library Science, Associate Research Librarian of Chengdu Library and Information Center, Chinese Academy of Sciences. Master tutor in Library and Information Science of Sichuan University. From Sep. 2015 to Feb 2016, served as Consultant at FAO of the UN. I devoted to free public access to scientific researches results with the right of legal usage. My research interests are metadata, KOS, User research. Applied for and led over 20 projects, and published over 30 papers.

The state of OAI-PMH repositories in Canadian Universities

Authors: Frédéric Piedboeuf, Guillaume Le Berre, David Alfonso-Hermelo, Olivier Charbonneau, and Philippe Langlais

This article presents a study of the current state of Universities Institutional Repositories (UIRs) in Canada. UIRs are vital to sharing information and documents, mainly Electronic Thesis and Dissertation (ETDs), and theoretically allow anyone, anywhere, to access the documents contained within the repository. Despite calls for consistent and shareable metadata in these repositories, our literature review shows inconsistencies in UIRs, including incorrect use of metadata fields and the omission of crucial information, rendering the systematic analysis of UIR complex. Nonetheless, we collected the data of 57 Canadian UIRs with the aim of analyzing Canadian data and to assess the quality of its UIRs. This was surprisingly difficult due to the lack of information about the UIRs, and we attempt to ease future collection efforts by organizing vital information which are difficult to find, starting from addresses of UIRs. We furthermore present and analyze the main characteristics of the UIRs we managed to collect, using this dataset to create recommendations for future practitioners.

Frédéric Piedboeuf

University of Montreal
ORCID

LinkedIn
Frédéric Piedboeuf is a PhD candidate from the RALI lab at Université de Montréal, currently working in data mining and automatic keyphrase generation in academic context. He has co-written a number of scientific papers including "Personality extraction through LinkedIn" and "Effective data augmentation for sequence classification using one VAE per class". His research interests also include climate modelling, generative models, and the use of small data for machine learning.