DCMI: Best Practices

AI-Assisted Cataloging in Practice: A Human-in-the-Loop Approach to Scalable Metadata Creation

Academic libraries face growing resource volumes and a shortage of experienced catalogers. To address this, the National University of Singapore Libraries initiated the AI-Powered Cataloging (AICAT) Project, exploring how large language models (LLMs) can augment bibliographic metadata creation while maintaining human oversight. This case study shares a practical implementation that generates draft MARC21 records compliant with RDA, ISBD, LCSH, and LCC standards. Utilizing engineered prompts with strict operational constraints—such as hallucination prevention and metadata validation—the project evaluates multiple LLMs to assess accuracy and standards adherence. The three-pronged methodology encompasses prompt engineering, comparative model benchmarking, and workflow automation testing. Results show that AI significantly reduces repetitive cataloging effort and improves scalability, while catalogers retain final editorial authority. Ultimately, effective AI deployment depends on aligning machine outputs with human values through standards-based validation and expert oversight.

Mi-kyeong Kam

Principal Librarian

National University of Singapore
ORCID
Kam Mi-Kyeong is a Principal Librarian in the Collections Management & Preservation Cluster at NUS Libraries. She leads the Resource Organization Team, overseeing metadata services, e-resource management, and resource organization workflows. Her work focuses on library systems and process improvement, leveraging information technologies to optimize resource management in support of research, teaching, and learning. Her professional interests include metadata innovation, workflow automation, and the strategic application of AI to enhance library services and operational efficiency.

Building JATS XML Scholarly Data at the National Library of Korea: Achievements and Plans for Opening AI Training Data

The National Library of Korea (NLK) has been implementing the OAK (Open Access Korea) project since 2014 to promote the spread of open access. Through this initiative, the NLK has distributed repositories to universities, research institutes, and public institutions to support the expansion of Green OA (self-archiving). In parallel, the Korean Journal Copyright Information (KJCI) system has been operated to provide information on copyright policies and Creative Commons (CC) licenses of academic societies. As part of these efforts, JATS XML full texts and metadata have been built for domestic open-access journals and shared with academic societies and the National Research Foundation of Korea (NRF). More recently, the NLK has been engaging directly with academic societies to obtain consent for the use of JATS XML scholarly data as AI training data. The consented data will be labeled as AI training data on the NLK's OAK National Repository, and will also be made publicly available through the NLK's Open Data Library. In the long term, the NLK seeks to broaden the reach of Korean research and enhance the societal impact of OA journals by offering a copyright-cleared, openly accessible Korean scholarly dataset.

Hyesun Han

Librarian

National Library of Korea
ORCID
Hyesun Han is a librarian at the National Library of Korea. She has built broad expertise across core library operations, including subject and research information services, collection management, cataloging, foreign materials acquisition, and international publication exchange. Currently, as part of the Digital Initiatives Division, she is responsible for advancing the Open Access Korea (OAK) initiative to build and disseminate national knowledge information.

Designing a BIBFRAME Conversion Module Based on Experience in Operating Korean National Bibliography LOD: KORMARC Issue Analysis and a Pilot Application Case

Since 2013, the National Library of Korea(NLK) has been operating Korean National Bibliography(KNB) LOD service, opening bibliographic and authority data based on KORMARC and MODS as RDF. This presentation reports on the methodology and interim outcomes of an ongoing study on the design and empirical implementation of a BIBFRAME conversion module, building on this operational experience. In particular, regarding the BIBFRAME transition, this study compares the LC approach with the NLK’s own proposed approach to examine which approach is more suitable for the KNB environment. Furthermore, it investigates how the structural characteristics of KORMARC influence the distinction between ‘Work’ and ‘Instance’ in BIBFRAME during this process. Rather than presenting a finalized transition result, this presentation aims to share key considerations and practical directions in the process of preparing for a BIBFRAME-based environment on the basis of experience gained from operating the KNB LOD.

Seungmun Ahn

Librarian

National Library of Korea
ORCID
Seungmoon Ahn is a librarian at the National Library of Korea and has worked in the areas of digitization of library resources, information services, and the development and use of bibliographic data. Since 2024, he has been working in the Metadata and Sustainable Access Division of the NLK, where he is responsible for national bibliographic data standardization, Korean Library Information System Network (KOLIS-NET), and the publication of Linked Open Data (LOD) as part of national bibliography-based services.

Establishing and Disseminating the K-Museum Data Standard at the National Museum of Korea(NMK)

The National Museum of Korea has led the standardization, integration, and opening of museum collection data in Korea through the establishment and dissemination of the K-Museum data standard, the release of public data, and the operation of eMuseum, an integrated public access platform. This presentation introduces the development and major achievements of museum data standardization in Korea through the case of the National Museum of Korea's eMuseum, and further explores possible directions for the future application of global data standards. In particular, it examines how the practical experience of reorganizing collection information that had previously been managed differently across institutions into a more consistent structure and linking it to integrated search and public access services can serve as a foundation for establishing and disseminating the K-Museum data standard. eMuseum is a representative platform where the achievements of data openness are brought together, allowing users to explore and use collection information from most museums in Korea within a single digital environment. This presentation examines the main components and implementation of the K-Museum data standard, and discusses how standardized data has supported the operation and enhancement of eMuseum as well as wider public access to museum collection information. Building on this domestic standardization experience, it also considers the practical challenges and institutional implications that need to be addressed when applying global data standards to the Korean museum environment.

Youngmin Ko

Curator

National Museum of Korea(NMK)
ORCID
Youngmin Ko is a curator at the National Museum of Korea. His work focuses on museum information management, digital curation, metadata, and the organization of cultural heritage information. He is interested in improving the interoperability, discoverability, and reuse of information across digital environments. His recent work includes projects related to metadata standards, data curation, and knowledge systems in cultural heritage and scholarly communication contexts.

From Cataloguing Rules to Community Best Practices: Governing Collaborative Entity-Based Knowledge Graphs

The transition from record-based cataloguing to entity-based knowledge graphs is reshaping not only bibliographic models and technologies, but also the social and organizational foundations of metadata creation. In shared cataloguing environments, entities are no longer owned and maintained by a single institution; instead, they become community-managed resources that evolve through the contributions of multiple stakeholders. In this context it’s interesting to present the experience of the Share Family community in defining best practices for collaborative entity management within JCricket, an entity editor operating on a shared, multi-provenance knowledge graph. Drawing on several years of experimentation with linked data cataloguing, cooperative workflows, and cross-platform interoperability, the community has identified a set of governance challenges that cannot be addressed by traditional cataloguing rules alone. The contribution discusses how best practices are being developed around key areas such as entity creation, provenance management, selection of preferred labels, authority control, relationship management, entity merge and split operations, visibility policies, and integration with external authoritative sources. These practices are designed to ensure consistency, transparency, and trust in an environment where multiple institutions simultaneously enrich and curate the same network of entities. Rather than focusing on cataloguing standards themselves, the presentation argues that the next phase of metadata practice requires community-governed operational frameworks capable of supporting collaborative stewardship of shared knowledge graphs. The Share Family experience offers a practical case study of how libraries can move beyond the traditional dichotomy of local versus copy cataloguing toward new models of cooperative and shared cataloguing, where governance, interoperability, and collective responsibility become central components of metadata quality. The presentation aims to contribute to the broader DCMI discussion on metadata best practices by exploring how communities can establish sustainable governance mechanisms for entity-based description in the linked data era.

Tiziana Possemato

Founding partner and Director of @Cult (Casalini Libri Group)

@Cult - Casalini Libri Group
ORCID

LinkedIn
Tiziana Possemato holds a degree in Philosophy from Sapienza University of Rome and diplomas in Archival and Library Science from the Vatican Schools. She earned a Master’s degree and a PhD in Library Science from the University of Florence. A metadata specialist, she has led national and international projects on library automation, data analysis, and information retrieval. Her work focuses on Linked Open Data and the Semantic Web. She is a member of the IFLA Bibliography Section and author of numerous publications.

From Records to Entities: Open Metadata Infrastructure for Consortial E-Book Resource Sharing

Metadata is increasingly recognized not merely as descriptive information, but as critical infrastructure enabling open knowledge ecosystems. As a foundation for interoperability, discovery, and reuse across institutional and national boundaries, open metadata supports more connected and efficient access to information across systems. Yet traditional record-based systems remain constrained by fragmentation, local variation, and limited interoperability, resulting in persistent data silos that hinder effective resource sharing. This presentation examines an ongoing, multi-phase pilot led by the Partnership for Academic Library Collaboration & Innovation (PALCI), a consortium of over 70 academic and research libraries across the Mid-Atlantic United States. The E-ILL Pilot is a proof-of-concept effort to understand how Linked Open Data (LOD) may support resource sharing within a consortial environment. This pilot, a collaboration with Share-VDE and ReShare, examines integrating LOD entities into the ReShare platform. It evaluates how entity models can improve discovery and secure inter-institutional e-book delivery. Moving beyond string-based bibliographic matching, which often fragments representations of the same work, the pilot employs entity-based clustering using the Share-VDE BIBFRAME ontology, a resource description model that reconciles IFLA LRM and RDA. This approach reconciles metadata variations and aggregates holdings at the work level, a capability especially important in a digital context where resources can have multiple formats and versions. To support emerging “e-first” interlibrary loan models, the project examines whether open, interoperable, entity-based metadata can identify related instances and improve matching across manifestations of the same work to support fulfillment. Framed within PALCI’s Digital Sharing Strategy, this work operationalizes metadata as infrastructure, emphasizing openness, reuse, and interoperability as essential to scalable, vendor-neutral systems. As an exploratory initiative, it advances understanding of how open metadata and LOD frameworks can inform interoperable, efficient, and scalable resource-sharing ecosystems across consortial and institutional boundaries.

Nina Servizzi

Associate Dean, Knowledge Access and Resource Management Services

New York University
ORCID
Nina Servizzi is Associate Dean for Knowledge Access and Resource Management Services at New York University Libraries. Her work focuses on how libraries and cultural heritage institutions develop and sustain information infrastructures that support research and scholarship. She examines organizational adaptation, systems interoperability, and data governance as essential components of resilient information environments. She serves on the HathiTrust Program Steering Committee, contributes to IFLA, and is past Chair of the Share-VDE Advisory Council.

From Six Years to Six Hours: Prerequisites andImplications of AI-Assisted Development in a Library Context

This paper presents a case study of using AI-assisted development to redevelop iSearch, a legacy cataloging application used daily at the University of Toronto Libraries. Redevelopment has been deferred for six years due to competing institutional priorities. Emboldened by success using AI to develop Model Context Protocol (MCP) servers, the author produced a web-based prototype replacement in approximately six hours. This paper identifies the prerequisites that enabled rapid AI-assisted development and examines emerging implications for library practitioners considering using AI similarly. The paper argues that while AI-assisted development lowers the barrier to writing code, it does not reduce the need for preparation; rather, it changes what skills and knowledge practitioners must bring to the work, and what institutions must invest in to support them.

May Chan

Head, Metadata Services

University of Toronto
ORCID
May Chan is Head, Metadata Services at the University of Toronto Libraries, with 17 years of prior experience in public libraries at Vancouver and Burnaby, British Columbia. A Carpentries Instructor Trainer, she is committed to building computational and technical literacy among library practitioners, and has been active in cataloguing training and professional development in a variety of roles throughout her career. May currently serves as co-chair of the PCC Standing Committee on Training and the SCT Linked Data Training Task Group.

From the discovery service to the academic knowledge platform - the challenge at CiNii -

Now the requirements for academic discovery services (DSs) are dynamically changing in the era of Open Science and Artificial Intelligence. The emerging requirement is that DS is the source of evidence for faithful research activities and for Artificial Intelligence inference. The other important requirement is that DS is becoming a member of the big ecosystem of information processing by Artificial Intelligence. We are developing our service called CiNii, the discovery service for academic resources in Japan, towards this direction. In this talk, we report the current status of CiNii development, in particular, Knowlege Graph Service.

Hideaki Takeda

Professor/Director

National Institute of Informatics
ORCID

WebPage
Dr. Hideaki Takeda is Professor and Director of the Principles of Informatics Research Division at the National Institute of Informatics (NII), Japan. His research focuses on knowledge sharing systems, the Semantic Web, ontology and design theory. He is also Director of Research Center for Knowledge Media and Content Science (KMCS) at NII. As Director of KMCS, he leads the development of CiNii, the discovery service for academic resources in Japan. He is a board member of the International DOI Foundation and a board member of CLOCKSS.

GenAI-Assisted Deep Interpretation via Commentary Knowledge Graphs

Commentaries are texts produced through the interpretation and annotation of earlier classical works, representing one of the most important mechanisms of knowledge production and cultural transmission in traditional Chinese scholarship. However, the interpretive knowledge embedded in commentary document is characterized by complex, multi-layered, and linear structures, making comprehensive understanding and cross-textual synthesis challenging for both scholars and general readers. This paper presents a framework for GenAI-assisted deep interpretation based on Commentary Knowledge Graphs (CKGs). The proposed CKGs integrates interpretive relations, citation, provenance information, and entity alignments extracted from documents into a unified semantic representation, providing structured interpretive evidence for large language models (LLMs). Building upon this knowledge infrastructure, an LLM-based agent architecture supports deep interpretation through a series of coordinated tasks, including user intent recognition, key parameter extraction, resource retrieval and ranking, context-aware reasoning, and role-based multi-agent collaboration. By combining knowledge graph retrieval with generative AI, the framework enables multi-level interpretation, evidence-grounded reasoning, and the discovery of scholarly connections across texts. A prototype system was developed using Wenfu and its commentary documents as a case study. The system supports functionalities such as conversational interpretation and multi-agent scholarly debate, allowing users to explore classical texts through interactive and explainable AI-assisted workflows. The proposed framework illustrates how cultural heritage knowledge can be integrated with generative AI technologies to enhance the accessibility, interpretability, and reuse. It offers an approach to the digital preservation, exploration, and revitalization of intellectual heritage, while opening new possibilities for explainable AI-assisted research and next-generation digital humanities applications.

Mengjuan Weng

Postdoctoral Researcher

School of Journalism and Communication, Wuhan University Intelligent Computing Laboratory for Cultural Heritage, Wuhan University
ORCID
Mengjuan Weng is a Postdoctoral Researcher at the School of Journalism and Communication, Wuhan University, and a researcher at the Intelligent Computing Laboratory of Cultural Heritage. Her research focuses on knowledge organization and digital humanities, particularly knowledge modeling in specialized textual genres. She has published over ten papers, contributed to major national research projects, published a book chapter, and contributed to a patent and technical standard. She also serves as a reviewer for the ASIST Annual Meeting and Information Processing & Management.

Modeling, Linking, and Augmenting Video Game Archive: Practices from the RCGS Collection

Digital game archives require more than descriptive cataloging; they call for a metadata ecosystem that supports both conceptual rigor and practical reusability, underpinning computational use. This paper reports on metadata practices developed at the Ritsumeikan Center for Game Studies (RCGS), organized into three interrelated layers: modeling, linking, and augmenting. These practices are implemented in the RCGS’s online catalog service, the RCGS Collection. At the modeling layer, we develop an extended model based on the IFLA Library Reference Model (LRM) to represent the complex relationships inherent in digital games, including works, variations, packages, and individual items. This approach enables consistent description across heterogeneous archival materials while maintaining compatibility with established bibliographic standards. At the same time, for entities that require subjective interpretation—such as works, genres, franchises, and characters—we design the model to incorporate external authority data. In addition, for the implementation of an online catalog, we employ a “dumb-down” strategy (flattening structured metadata) to improve accessibility for general users. At the linking layer, we implement a Linked Open Data (LOD) approach that connects local entities to external authority sources such as Wikidata and the Media Arts Database (MADB). This alignment supports the persistence of identifiers, enhances interoperability, and enables data enrichment through the incorporation of collectively curated external knowledge. At the augmenting layer, we develop a dataset construction pipeline based on text extraction and image embeddings, transforming digitized materials—including packages, manuals, and gameplay images—into research-ready corpora. Through multimodal feature extraction and automated text structuring, archival data can be repurposed as a resource for computational analysis and metadata generation. These practices are implemented through a SPARQL-enabled LOD infrastructure and a discovery interface based on Omeka S, achieving a balance between a robust identifier infrastructure, machine-readability, and user accessibility. Rather than proposing a prescriptive model, this paper reflects on design decisions and implementation strategies to share insights for building sustainable metadata ecosystems in game archives and related domains.

Kazufumi Fukuda

Associate Professor

College of Image Arts and Sciences, Ritsumeikan University
ORCID
Kazufumi Fukuda, Ph.D., is an Associate Professor at Ritsumeikan University specializing in game studies, digital humanities, and knowledge graphs. His research explores the preservation and analysis of video game archives through metadata modeling, linked data, and data science. He is involved in developing game archive infrastructures and contributes to national projects on media arts databases, bridging academic research and practical applications.

Multi-Source Data Fusion-Driven AI Cataloging System: A Practical Case in Metadata Management

Against the backdrop of generative AI's rapid penetration into the library sector, how to make AI reliably and efficiently applicable to professional metadata production has become a shared concern for the global library community. Based on the Fourth-generation Intelligent System for Acquisition, Classification, and Cataloging introduced in 2024, the Capital Library of China has innovatively proposed a "Multi-Source Data Integration" technical roadmap, repositioning AI as an "intelligent hub" for coordinating, validating, and decision-making based on data from multiple sources, rather than a sole data producer. This study systematically integrates data from three distinct levels: (1) the Fundamental Data Layer, relying on CIP data from the multidimensional bibliographic database provided by China National Archives of Publications and Culture; (2) the Physical Data Layer, comprising physical metadata precisely collected by sensors in the intelligent processing system (e.g., millimeter-level dimensions, weight, ISBN images); and (3) the Intelligent Parsing Layer, extracting deep semantic information from book scans using multimodal large models, computer vision, and OCR technologies. Supported by four functional engines (multimodal recognition, semantic understanding, multi-source data integration, and automatic MARC field generation), the system automatically resolves data conflicts based on the rule of "priority to benchmarks, fact-checking, and intelligent supplementation," generating standard CNMARC records. From its trial operation in September 2025 to March 2026, the AI Cataloging System has stably processed over 220 books, with an average processing time of approximately 6 minutes per book and automation accuracy exceeding 98% for core descriptive fields. This practice has reshaped the human-machine collaboration paradigm in cataloging: the AI engine undertakes rule-based and repetitive information extraction and integration, while catalogers transition into roles as system trainers, arbiters for complex cases, and auditors of final quality, realizing the complementary strengths of human and AI. This study offers a reusable framework for global peers: by constructing a hybrid intelligent system centered on "multi-source data integration" and safeguarded by "human-machine collaboration," libraries can responsibly and efficiently propel core operations from labor-intensive to knowledge-intensive development.

Juan Zhang

Deputy Director

Capital Library of China
ORCID
Juan Zhang serves as Deputy Director of the Capital Library of China and holds the title of Research Librarian. She is a member of the Information Organization Subgroup under the Academic Research Committee of the Library Society of China, also a member of the National Technical Committee for Standardization of Identification and Description.She has participated in drafting national standards such as Professional Specifications for Public Libraries and Information and Documentation—Resource Description.

Sustainable Linked Open Data Publishing through Static Site Generation: A Decade of Educational LOD Infrastructure

The long-term sustainability of Linked Open Data (LOD) services remains a significant challenge. Many projects rely on complex server-side infrastructures, SPARQL endpoints, or custom applications that become difficult to maintain once project funding or dedicated technical support ends. This presentation introduces a practical approach to sustainable LOD publishing based on ttl2html, a static site generator that transforms RDF/Turtle datasets into human-readable and machine-accessible web resources. Using this approach, we have developed and continuously maintained an educational LOD infrastructure in Japan since its public launch in 2017. The platform interlinks educational resources such as textbooks, curriculum guidelines, instructional units, and large-scale educational assessments, providing persistent identifiers and Linked Data access through a lightweight and maintainable architecture. We discuss the design principles behind this approach, including static content generation, reduced operational complexity, and compatibility with Linked Data best practices. We report on long-term maintenance experience and examine how minimizing technical dependencies has contributed to the sustainability of the infrastructure. We also demonstrate how the same framework has been reused in other contexts, including student-led initiatives and library-related LOD projects, illustrating its adaptability and low barrier to adoption. By sharing lessons learned from a decade of operation, this presentation highlights how reducing technical complexity through static site generation can improve the sustainability of LOD publishing. Our experience further demonstrates that lightweight publishing infrastructures can be readily reused across projects and organizations, providing a practical foundation for the long-term development and preservation of open knowledge infrastructures.

Masao Takaku

Associate Professor

Institute of Library, Information and Media Science, University of Tsukuba
ORCID

WebPage

LinkedIn
Masao Takaku is an Associate Professor in the Institute of Library, Information and Media Science at the University of Tsukuba. His research focuses on digital libraries and information organization, with recent work on metadata and knowledge graph development for cultural heritage and educational use. He also teaches information organization and related subjects at both undergraduate and graduate levels and works on knowledge infrastructure design across diverse domains.

Weaving together the threads: Laying the foundations for a metadata strategy at the Bodleian Libraries, University of Oxford

Metadata for discovery, security and management of collections is a key priority at the University of Oxford, with our institutional strategy stating that to: ‘widen access to our outstanding cultural and scientific collections…. We will aim to have 100% of our collection recorded online… by 2030.’ Delivering this target requires creative approaches to funding, partnerships and technology. This presentation will explore a range of initiatives currently underway, including the automatic creation and enrichment of metadata within our institutional repository; the repurposing and delivery of metadata from legacy finding aids; and the development of pilot workflows for the curation, creation and automated generation of textual transcripts for digitised Special Collections materials. These projects feed into our work at a national level, where we are leading funded research to explore how metadata from different content types can be brought together through minimum viable metadata to enable researchers to engage with collections at scale. We are also one of five testing partners for the UK implementation of BlueCore, the LD4P/Mellon-funded platform for linked open bibliographic metadata. This collaboration represents an important first step in exploring the potential of BIBFRAME for large-scale implementation across the UK. We are also working in partnership with commercial suppliers and other higher education colleagues to test the use of large language models and model context protocols as a route to discovery of our collections. The metadata available to these tools and products is crucial and will be a key element in our evaluation and examination of the effectiveness of the tools and when considering possible future developments. As a next step we plan to develop a metadata strategy, which will identify synergies and areas for collaboration and bring a user focused approach, in order to connect our collections to each other as well as to other collections globally.

Amy Warner May

Associate Director, Scholarly Resources

Bodleian Libraries, University of Oxford
ORCID
Amy Warner May is Associate Director, Scholarly Resources at the Bodleian Libraries, University of Oxford. In this role she a member of the Bodleian’s Exec Team, responsible for Collections Management, Open Research and Digital, and chairs the Digital Committee for Oxford’s Garden’s, Libraries and Museums. Before joining the Bodleian Amy was Associate Director at Royal Holloway, University of London, and worked at the UK National Archives as Head of Systems Development and Search. She is a Board Member of the Digital Preservation Coalition and on the Science Museum Collections Committee.

Best Practices

AI-Assisted Cataloging in Practice: A Human-in-the-Loop Approach to Scalable Metadata Creation

Mi-kyeong Kam

Building JATS XML Scholarly Data at the National Library of Korea: Achievements and Plans for Opening AI Training Data

Hyesun Han

Designing a BIBFRAME Conversion Module Based on Experience in Operating Korean National Bibliography LOD: KORMARC Issue Analysis and a Pilot Application Case

Seungmun Ahn

Establishing and Disseminating the K-Museum Data Standard at the National Museum of Korea(NMK)

Youngmin Ko

From Cataloguing Rules to Community Best Practices: Governing Collaborative Entity-Based Knowledge Graphs

Tiziana Possemato

From Records to Entities: Open Metadata Infrastructure for Consortial E-Book Resource Sharing

Nina Servizzi

From Six Years to Six Hours: Prerequisites andImplications of AI-Assisted Development in a Library Context

May Chan

From the discovery service to the academic knowledge platform - the challenge at CiNii -

Hideaki Takeda

GenAI-Assisted Deep Interpretation via Commentary Knowledge Graphs

Mengjuan Weng

Modeling, Linking, and Augmenting Video Game Archive: Practices from the RCGS Collection

Kazufumi Fukuda

Multi-Source Data Fusion-Driven AI Cataloging System: A Practical Case in Metadata Management

Juan Zhang

Sustainable Linked Open Data Publishing through Static Site Generation: A Decade of Educational LOD Infrastructure

Masao Takaku

Weaving together the threads: Laying the foundations for a metadata strategy at the Bodleian Libraries, University of Oxford

Amy Warner May