DCMI: Short Papers

A Study on Extending Metadata in the School Library DLS in South Korea

This study analyzes the structural limitations of DLS metadata used in South Korean school libraries and proposes an approach for its extension. The current DLS metadata structure, which primarily focuses on bibliographic and holding information, has limitations in supporting curriculum integration and data sharing with external resources. To address these issues, this study proposes curriculum-linked metadata elements, including school level, grade level, subject area, and subject topic, and reconstructs them using a BIBFRAME-based RDF triple structure. This approach extends DLS metadata into an entity-based semantic framework and enables linkages with external web resources. Furthermore, this study enhances the direct integration of school library metadata with the curriculum, improving the efficiency of resource discovery and use, and providing a foundation for the systematic utilization of school library collections.

Bolan Kim

Ph.D. Student, Chung-Ang University

Chung-Ang University
ORCID
Bolan Kim is a Ph.D. student in the Department of Library and Information Science at Chung-Ang University in Seoul, South Korea, specializing in Knowledge Organization. She also serves as a school librarian. Her research interests focus on metadata re-architecture for school library collections and the organization of resource metadata to support teaching and learning activities. Currently, she is conducting research on metadata systems for digital school libraries to contribute to the advancement of library and information science.

Equitable Metadata for Diverse Voices: Sustainable Computational Poetry Analysis with HathiTrust Extracted Features

Authors: Kahyun Choi, You Peng, Gyuri Kang

The retirement of the HathiTrust Research Center (HTRC) infrastructure raises questions about continuing computational research on in-copyright collections in the HathiTrust Digital Library (HTDL). Since HTRC has actively supported inclusive research on underrepresented groups, the HTDL collections serve as a crucial test case for exploring post-HTRC workflows. To address this, we share an augmented dataset of American poetry by poets from historically underrepresented groups in the HTDL. Mapping this collection to HTRC Extracted Features (EF) v2.5 demonstrated that EF is highly reliable with high retrieval coverage, achieving a 100\% match. Our computational linguistic analysis shows that EF effectively captures group-specific diversity, such as non-standard English, indigenous languages, and multilingual vocabularies. These findings indicate that adapting ML and NLP tools to properly handle such linguistic variation is essential to mitigate bias and marginalization. Although the lack of full-text limits structural analysis, EF remains a sustainable and highly useful resource for word-based research well beyond the HTRC's retirement.

Kahyun Choi

Assistant Professor

University of Illinois Urbana-Champaign
ORCID

LinkedIn
Kahyun Choi is an Assistant Professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. She received her PhD from UIUC. Her research applies computational methods and machine learning to cultural data, focusing on computational analysis of poetry and music, human–AI co-creation of cultural metadata, and ethical AI for digital libraries. Her work spans Music Information Retrieval and Digital Humanities. She received the 2022 IMLS Early Career Grant and the 2021 IMLS National Leadership Grant, and the 2023 IU Trustees Teaching Award.

Extending KCR5 Relationships for Integration with BIBFRAME 3.0

Authors: Minjung Park

This study examines the limitations of relationship definitions in KCR5 and proposes an extension model based on BIBFRAME 3.0 to address these issues. Although KCR5 adopts the FRBR-based entity-relationship structure, its scope remains largely focused on manifestation-level description and is primarily limited to WEMI–Agent relationships. As a result, relationships involving contextual entities such as place, time, and subject are insufficiently represented. To overcome these limitations, this study utilizes BIBFRAME 3.0 classes and properties to propose an extended relationship framework centered on contextual entities. The proposed model enhances the representation of spatiotemporal and thematic relationships within bibliographic data, enabling richer semantic connections among resources. Furthermore, this study highlights the potential for improving interoperability in international data exchange and supports the development of a semantic, knowledge graph-based bibliographic environment grounded in BIBFRAME. The findings contribute to advancing the transition from record-based cataloging to entity-based, relationship-oriented bibliographic structures in the Korean cataloging context.

Minjung Park

Doctoral student

Chung-Ang University
ORCID
Doctoral student, Department of Library and Information Science, Chung-Ang University, Seoul, South Korea. Park earned her B.S. and M.S. degrees in Library and Information Science from Chung-Ang University. Her research interests include bibliographic metadata, bibliographic framework, and library development policy.

FAIR Open Metadata: A Case Study of RePEc

Authors: Anna Oates Schlaack, Christian Zimmermann

This report describes a research project about the use of RePEc metadata and its accordance with FAIR principles. The authors summarize successful aspects of a metadata schema that has underpinned the research landscape in economics for nearly thirty years. This report describes the use cases of 100 studies that have leveraged RePEc metadata and future research that will document the challenges of metadata citation and opportunities to engender open metadata efforts.

Anna Oates Schlaack

Assistant Professor, Cataloging & Metadata Librarian

University of Illinois Urbana-Champaign
ORCID
Anna Oates Schlaack (she/her) is an Assistant Professor and the Cataloging and Metadata Librarian at the University of Illinois Urbana-Champaign, where she leads special formats and English-language monographic cataloging. While using the Modernist Journals Project as a student in the humanities, she found her calling–curating and describing information so that it can be used in novel ways. Whether to support digital humanities research, personal genealogical research, or scientometrics, Schlaack is committed to stewarding open metadata and information resources for use across the globe.

From MARC to Linked Open Data: AI-Driven Entity Extraction from Hebrew Manuscript Metadata Using Distant Supervision

Authors: Alexander Goldberg, Gila Prebor, Avshalom Elmalech

Cultural heritage institutions preserve invaluable provenance information in Machine-Readable Cataloging (MARC) records, yet much of this knowledge remains trapped in unstructured note fields, inaccessible to computational analysis. Transforming these legacy catalogs into Linked Open Data (LOD) requires extracting structured person-role relationships—identifying authors, scribes, owners, and censors—from cataloger narratives. This study presents an AI-driven system that uses distant supervision from MARC metadata itself to automatically generate training data, eliminating the prohibitive cost of manual annotation for specialized cultural heritage domains. By exploiting the dual structure of catalog records, where structured fields provide authoritative labels and unstructured notes provide context, we achieve 85.70% F1 for person extraction and 100% role classification accuracy, outperforming general Hebrew NER models by +55.55% F1. Our approach demonstrates how existing metadata can be leveraged to train AI systems that align with the values of cultural heritage preservation: accuracy, provenance tracking, and semantic enrichment. The extracted entities populate ontology instances based on CIDOC-CRM and IFLA-LRM, enabling computational analysis of scribal networks and manuscript circulation at scale.

Avshalom Elmalech

Researcher

Bar-Ilan University
ORCID
Avshalom Elmalech is a researcher at Bar-Ilan University with a PhD in Computer Science, working at the intersection of applied artificial intelligence and digital humanities. His research bridges information science and AI by examining how deep learning methods can be effectively applied to humanities data. He has contributed practical frameworks for guiding digital humanities scholars in choosing and adapting NLP and deep learning approaches under constraints such as limited training data and domain specificity.

Gila Prebor

Associate Professor, Department of Information Science

Bar-Ilan University
ORCID

WebPage
Gila Prebor is a researcher in the Department of Information Science at Bar-Ilan University, specializing in Hebrew manuscripts, paleography, and knowledge organization. Her research combines codicology and bibliography with Digital Humanities, focusing on AI, Handwritten Text Recognition (HTR), and Semantic Web technologies for cultural heritage data. She is co-editor of Alei Sefer and has received grants from the ISF, the EU, and Israel’s Ministry of Innovation, Science, and Technology. Her recent work applies Linked Data and AI to the study of Hebrew manuscripts.

From Notes to Knowledge Management: Representing KDC Classification Notes as Linked Data for Automated Classification

Authors: Haeryung Park, Seungmin Lee

This study reinterprets classification notes in the Korean Decimal Classification (KDC) as key semantic elements for automated classification and proposes a method for structuring relationships among classification entries. Existing approaches have relied on keyword-based analysis or simple mapping, limiting their ability to reflect the intellectual structure of classification systems. To address this limitation, this study analyzes the types and functions of KDC notes and identifies their roles in expressing semantic relationships, such as conceptual definition, hierarchical and associative links, subdivision rules, and exceptions. These relationships are then categorized into internal and external relations and formalized as properties within a linked data framework. This approach enables the transformation of unstructured notes into machine-processable structures and supports the development of semantically enriched, knowledge graph–based classification systems.

Haeryung Park

Chung-Ang University
ORCID
Haeryung Park is a master's candidate in the Department of Library and Information Science at Chung-Ang University, Seoul, South Korea. Her research centers on knowledge organization, specifically focusing on classification systems and the Korean Decimal Classification (KDC). Her current work involves the structural modeling of KDC classification notes and representing them as Linked Data.

Metadata-Driven Semantic Interoperability: The HerStory-NeSyAI project for Trustworthy Neurosymbolic AI in Digital Humanities

Authors: Miquel Centelles Velilla, Matheus Jenevain, Elena Gómez, Núria Ferran-Ferrer

This short paper presents HerStory-NeSyAI as a work-in-progress neurosymbolic project in Digital Humanities. The project addresses fragmented historical datasets through metadata-driven semantic interoperability, combining a knowledge graph, ontology layer, and retrieval-augmented generation (RAG). It treats interoperability as a condition for epistemic justice and explores how graph-grounded metadata can improve transparency, traceability, and data integrity while mitigating bias, hallucinations, and poisoning risks. In operational terms, these goals are implemented through a prototype, which enables a three-path strategy for connecting heterogeneous datasets.

Miquel Centelles Velilla

Universitat de Barcelona
ORCID
Professor at the University of Barcelona in Information Science, with prior experience at Universitat Pompeu Fabra’s library as coordinator and library assistant. Trained in Library & Information Science and Linguistics, with doctoral studies in cognitive science and language. Teaching and research focus on digital content and knowledge organization, including EPUB3 e-book metadata, RDF/linked data, taxonomies, and accessible multimodal learning resources, supported by publications, projects, and conference contributions. Currently working on neurosymbolic AI using knowledge graphs.

Motivations for Participating in Biomedical Ontology Communities within Human-AI Collaboration

Authors: Jiwoo Seo

This paper presents a literature-based synthesis of motivations for participating in biomedical ontology communities, viewed as metadata infrastructures. As generative AI transforms ontology curation into human-in-the-loop workflows, human engagement becomes essential for ensuring metadata quality. Using Self-Determination Theory and Activity Theory, it identifies four themes—intrinsic motivation, extrinsic motivation, community aspects, and human AI collaboration—and analyzes their impact on autonomy, competence, and relatedness. Based on these themes, the study proposes practical implications for provenance-enhanced verification, quality-based incentives, and collaborative environments to sustain metadata quality and ongoing contributions.

Jiwoo Seo

Florida State University
ORCID

LinkedIn
Jiwoo Seo is a Ph.D. student and Research Assistant in Information at Florida State University, studying human-AI collaboration and ontologies. With an interdisciplinary background spanning library and information science and web science, her research examines how humans and AI systems collaborate. Previously, she worked as an NLP researcher in corporate AI labs and as an AI specialist librarian. At DCMI 2026, she presents a motivation-based conceptual framework applying Activity Theory and Self-Determination Theory to enhance user participation in metadata and biomedical ontology communities.

Multilingual Metadata: Aligning Digital Heritage Systems with Cultural Values

Authors: Robin Dresel, Pamela Low

Metadata systems traditionally prioritise technical efficiency over cultural authenticity. This paper examines how Singapore's National Library Board redesigned metadata practices for two major cultural heritage digital projects—the Encyclopedia of Singapore Tamils (EST) and Prominent Malays of Singapore (PMoS)—to align digital systems with community values. Breaking from locally established conventions, we implemented largely monolingual metadata records, creating separate collection identifiers, language-specific navigation paths, and culturally-aware controlled vocabularies. This approach required overcoming technical constraints in content management systems designed for English-language dominant workflows. Our methodology involved close collaboration with over 500 community contributors. Key innovations include collection name separation, multilingual controlled vocabulary integration, and community-driven category translation that reflects cultural mental models rather than literal translations. Implementation demonstrates that separate monolingual metadata records can preserve cultural authenticity while maintaining system functionality. These approaches create valuable training data for AI systems working with multilingual cultural heritage, offering a replicable model for institutions seeking to align digital systems with diverse community needs rather than technical convenience.

Robin Dresel

Assistant Director / Senior Librarian

National Library Board Singapore
ORCID

LinkedIn
Robin Dresel is Assistant Director, Metadata Services at the National Library Board Singapore, managing teams responsible for digital resources and non-purchase collections such as legal deposit, rare items, and donations. Drawing on over 20 years in libraries, he works at the intersection of cataloguing operations and technology, with a growing interest in how AI and system design can better serve diverse communities and collections. Recent studies in Digital Humanities sparked his curiosity about how humans and systems interact, and what that means for metadata practice.

TEI Encoding as Infrastructure for Meaning-Driven AI in Portuguese Literature

Authors: Diego Emanuel Giménez Celano

This presentation explores how TEI encoding can serve as infrastructure for meaning‑driven AI in Portuguese literature. Drawing on the encoding of Luís de Camões’ Os Lusíadas, it shows how metadata functions as epistemic governance, shaping what AI systems can recognize and reproduce as knowledge. Rather than treating AI as an interpretative agent, the talk frames it within a semantic architecture designed by human editors. By highlighting ethical implications and practical workflows, it argues that responsible AI participation in cultural knowledge depends on the design of metadata environments, not just algorithmic correction.

Diego Emanuel Giménez Celano

Assistant Professor

University of Macau
ORCID

LinkedIn
Diego Emanuel Giménez Celano is Professor of Portuguese Literature at the University of Macau. He holds a PhD in Literature and Thought from the University of Barcelona, with a dissertation on Fernando Pessoa’s The Book of Disquiet. He was a fellow of the Calouste Gulbenkian Foundation and a researcher on "No Problem Has a Solution: A Digital Archive of the Book of Disquiet" at the University of Coimbra. He was a postdoctoral researcher at the State University of Londrina, collaborates with Camões Lab, and is PI of "Portuguese Literary Studies: Texts, Readings, and Digital Approaches".

Short Papers

A Study on Extending Metadata in the School Library DLS in South Korea

Bolan Kim

Equitable Metadata for Diverse Voices: Sustainable Computational Poetry Analysis with HathiTrust Extracted Features

Kahyun Choi

Extending KCR5 Relationships for Integration with BIBFRAME 3.0

Minjung Park

FAIR Open Metadata: A Case Study of RePEc

Anna Oates Schlaack

From MARC to Linked Open Data: AI-Driven Entity Extraction from Hebrew Manuscript Metadata Using Distant Supervision

Avshalom Elmalech

Gila Prebor

From Notes to Knowledge Management: Representing KDC Classification Notes as Linked Data for Automated Classification

Haeryung Park

Metadata-Driven Semantic Interoperability: The HerStory-NeSyAI project for Trustworthy Neurosymbolic AI in Digital Humanities

Miquel Centelles Velilla

Motivations for Participating in Biomedical Ontology Communities within Human-AI Collaboration

Jiwoo Seo

Multilingual Metadata: Aligning Digital Heritage Systems with Cultural Values

Robin Dresel

TEI Encoding as Infrastructure for Meaning-Driven AI in Portuguese Literature

Diego Emanuel Giménez Celano