Short Papers

The programme is still being finalized and is subject to ongoing updates as sessions are scheduled. Please check back regularly for the latest changes.

FAIR Open Metadata: A Case Study of RePEc

Authors: Anna Oates Schlaack, Christian Zimmermann

This report describes a research project about the use of RePEc metadata and its accordance with FAIR principles. The authors summarize successful aspects of a metadata schema that has underpinned the research landscape in economics for nearly thirty years. This report describes the use cases of 100 studies that have leveraged RePEc metadata and future research that will document the challenges of metadata citation and opportunities to engender open metadata efforts.
  • Anna Oates Schlaack

    University of Illinois Urbana-Champaign

    Anna Oates Schlaack (she/her) is an Assistant Professor and the Cataloging and Metadata Librarian at the University of Illinois Urbana-Champaign, where she leads special formats and English-language monographic cataloging. While using the Modernist Journals Project as a student in the humanities, she found her calling–curating and describing information so that it can be used in novel ways. Whether to support digital humanities research, personal genealogical research, or scientometrics, Schlaack is committed to stewarding open metadata and information resources for use across the globe.

From MARC to Linked Open Data: AI-Driven Entity Extraction from Hebrew Manuscript Metadata Using Distant Supervision

Authors: Alexander Goldberg, Gila Prebor, Avshalom Elmalech

Cultural heritage institutions preserve invaluable provenance information in Machine-Readable Cataloging (MARC) records, yet much of this knowledge remains trapped in unstructured note fields, inaccessible to computational analysis. Transforming these legacy catalogs into Linked Open Data (LOD) requires extracting structured person-role relationships—identifying authors, scribes, owners, and censors—from cataloger narratives. This study presents an AI-driven system that uses distant supervision from MARC metadata itself to automatically generate training data, eliminating the prohibitive cost of manual annotation for specialized cultural heritage domains. By exploiting the dual structure of catalog records, where structured fields provide authoritative labels and unstructured notes provide context, we achieve 85.70% F1 for person extraction and 100% role classification accuracy, outperforming general Hebrew NER models by +55.55% F1. Our approach demonstrates how existing metadata can be leveraged to train AI systems that align with the values of cultural heritage preservation: accuracy, provenance tracking, and semantic enrichment. The extracted entities populate ontology instances based on CIDOC-CRM and IFLA-LRM, enabling computational analysis of scribal networks and manuscript circulation at scale.
  • Avshalom Elmalech

    Bar-Ilan University

    Avshalom Elmalech is a researcher at Bar-Ilan University with a PhD in Computer Science, working at the intersection of applied artificial intelligence and digital humanities. His research bridges information science and AI by examining how deep learning methods can be effectively applied to humanities data. He has contributed practical frameworks for guiding digital humanities scholars in choosing and adapting NLP and deep learning approaches under constraints such as limited training data and domain specificity.
  • Gila Prebor

    Bar-Ilan University

    Gila Prebor is a researcher in the Department of Information Science at Bar-Ilan University, specializing in Hebrew manuscripts, paleography, and knowledge organization. Her research combines codicology and bibliography with Digital Humanities, focusing on AI, Handwritten Text Recognition (HTR), and Semantic Web technologies for cultural heritage data. She is co-editor of Alei Sefer and has received grants from the ISF, the EU, and Israel’s Ministry of Innovation, Science, and Technology. Her recent work applies Linked Data and AI to the study of Hebrew manuscripts.

From MARC to Linked OpenData: AI-Driven Entity Extraction from Hebrew Manuscript Metadata Using Distant Supervision

Authors: Alexander Goldberg,Gila Prebor, Avshalom Elmalech

Cultural heritage institutions preserve invaluable provenance information in Machine-Readable Cataloging (MARC) records, yet much of this knowledge remains trapped in unstructured note fields, inaccessible to computational analysis. Transforming these legacy catalogs into Linked Open Data (LOD) requires extracting structured person-role relationships—identifying authors, scribes, owners, and censors—from cataloger narratives. This study presents an AI-driven system that uses distant supervision from MARCmetadata itself to automatically generate trainingdata, eliminatingtheprohibitive cost of manual annotation for specialized cultural heritage domains. By exploiting the dual structure of catalog records, where structured fields provide authoritative labels and unstructured notes provide context, we achieve 85.70% F1 for person extraction and 100% role classification accuracy, outperform ing general Hebrew NER models by +55.55% F1. Our approach demonstrates how existing metadata can be leveraged to train AI systems that align with the values of cultural heritage preservation: accuracy, provenance tracking, and semantic enrichment. The extracted entities populate ontology in stances based on CIDOC-CRM and IFLA-LRM, enabling computational analysis of scribal networks and manuscript circulation at scale.
  • Gila Prebor

    Bar-Ilan University

    Gila Prebor is a researcher in the Department of Information Science at Bar-Ilan University, specializing in Hebrew manuscripts, paleography, and knowledge organization. Her research combines codicology and bibliography with Digital Humanities, focusing on AI, Handwritten Text Recognition (HTR), and Semantic Web technologies for cultural heritage data. She is co-editor of Alei Sefer and has received grants from the ISF, the EU, and Israel’s Ministry of Innovation, Science, and Technology. Her recent work applies Linked Data and AI to the study of Hebrew manuscripts.
  • Avshalom Elmalech

    Bar-Ilan University

    Avshalom Elmalech is a researcher at Bar-Ilan University with a PhD in Computer Science, working at the intersection of applied artificial intelligence and digital humanities. His research bridges information science and AI by examining how deep learning methods can be effectively applied to humanities data. He has contributed practical frameworks for guiding digital humanities scholars in choosing and adapting NLP and deep learning approaches under constraints such as limited training data and domain specificity.

Motivations for Participating in Biomedical Ontology Communities within Human-AI Collaboration

Authors: Jiwoo Seo

This paper presents a literature-based synthesis of motivations for participating in biomedical ontology communities, viewed as metadata infrastructures. As generative AI transforms ontology curation into human-in-the-loop workflows, human engagement becomes essential for ensuring metadata quality. Using Self-Determination Theory and Activity Theory, it identifies four themes—intrinsic motivation, extrinsic motivation, community aspects, and human AI collaboration—and analyzes their impact on autonomy, competence, and relatedness. Based on these themes, the study proposes practical implications for provenance-enhanced verification, quality-based incentives, and collaborative environments to sustain metadata quality and ongoing contributions.
  • Jiwoo Seo

    Florida State University

    Jiwoo Seo is a Ph.D. student and Research Assistant in Information at Florida State University, studying human-AI collaboration and ontologies. With an interdisciplinary background spanning library and information science and web science, her research examines how humans and AI systems collaborate. Previously, she worked as an NLP researcher in corporate AI labs and as an AI specialist librarian. At DCMI 2026, she presents a motivation-based conceptual framework applying Activity Theory and Self-Determination Theory to enhance user participation in metadata and biomedical ontology communities.

Multilingual Metadata: Aligning Digital Heritage Systems with Cultural Values

Authors: Robin Dresel, Pamela Low

Metadata systems traditionally prioritise technical efficiency over cultural authenticity. This paper examines how Singapore's National Library Board redesigned metadata practices for two major cultural heritage digital projects—the Encyclopedia of Singapore Tamils (EST) and Prominent Malays of Singapore (PMoS)—to align digital systems with community values. Breaking from locally established conventions, we implemented largely monolingual metadata records, creating separate collection identifiers, language-specific navigation paths, and culturally-aware controlled vocabularies. This approach required overcoming technical constraints in content management systems designed for English-language dominant workflows. Our methodology involved close collaboration with over 500 community contributors. Key innovations include collection name separation, multilingual controlled vocabulary integration, and community-driven category translation that reflects cultural mental models rather than literal translations. Implementation demonstrates that separate monolingual metadata records can preserve cultural authenticity while maintaining system functionality. These approaches create valuable training data for AI systems working with multilingual cultural heritage, offering a replicable model for institutions seeking to align digital systems with diverse community needs rather than technical convenience.
  • Robin Dresel

    National Library Board Singapore

    Robin Dresel is Assistant Director, Metadata Services at the National Library Board Singapore, managing teams responsible for digital resources and non-purchase collections such as legal deposit, rare items, and donations. Drawing on over 20 years in libraries, he works at the intersection of cataloguing operations and technology, with a growing interest in how AI and system design can better serve diverse communities and collections. Recent studies in Digital Humanities sparked his curiosity about how humans and systems interact, and what that means for metadata practice.

TEI Encoding as Infrastructure for Meaning-Driven AI in Portuguese Literature

Authors: Diego Emanuel Giménez Celano

This presentation explores how TEI encoding can serve as infrastructure for meaning‑driven AI in Portuguese literature. Drawing on the encoding of Luís de Camões’ Os Lusíadas, it shows how metadata functions as epistemic governance, shaping what AI systems can recognize and reproduce as knowledge. Rather than treating AI as an interpretative agent, the talk frames it within a semantic architecture designed by human editors. By highlighting ethical implications and practical workflows, it argues that responsible AI participation in cultural knowledge depends on the design of metadata environments, not just algorithmic correction.
  • Diego Emanuel Giménez Celano

    University of Macau

    Diego Emanuel Giménez Celano is Professor of Portuguese Literature at the University of Macau. He holds a PhD in Literature and Thought from the University of Barcelona, with a dissertation on Fernando Pessoa’s The Book of Disquiet. He was a fellow of the Calouste Gulbenkian Foundation and a researcher on "No Problem Has a Solution: A Digital Archive of the Book of Disquiet" at the University of Coimbra. He was a postdoctoral researcher at the State University of Londrina, collaborates with Camões Lab, and is PI of "Portuguese Literary Studies: Texts, Readings, and Digital Approaches".