DCMI: NKOS Workshop

Beyond Keywords: Retrieving Blue-and-White Ceramics in Dutch Paintings with Knowledge-Augmented CLIP

Authors: Yang Zhao

Museum catalogue metadata describes a painting's title, artist, date, and medium, but rarely the specific objects depicted within it. This is a structural limitation of text-based knowledge organization systems applied to visual collections. We study this problem through blue-and-white ceramics in Dutch Golden Age paintings from the Rijksmuseum open collection, where both expert researchers and general visitors struggle to find relevant works through keyword search. We compare three retrieval approaches over a corpus of 150 paintings with 28 ground-truth positives. Metadata search with expert terms reaches perfect precision but only 10.7% recall. Zero-shot CLIP performs comparably to general metadata search. A knowledge-augmented approach, which expands the query with annotations retrieved from a small curated reference set of 20 annotated paintings, doubles both metrics to 60.0% precision and 42.9% recall. An ablation study shows that annotation type matters more than quantity: short scene-context descriptions outperform object-feature descriptions drawn from ceramic photographs. This confirms that knowledge organization principles still apply in embedding space. A small, carefully designed annotation layer paired with a general-purpose vision-language model can outperform the full metadata apparatus for retrieving specific depicted content.

Yang Zhao

PhD student

Syracuse University
ORCID
Yang (pronounced “yahng,” rhymes with “song”) is a first-year Ph.D. student in Information Science and Technology at Syracuse University. Her research interests include digital humanities and information accessibility, with a focus on making visual cultural materials more open and usable for diverse audiences, especially people without technical backgrounds and those with different access needs.

Capturing Semantic Gaps in MeSH through Human-AI Collaboration

Authors: Jian Qin, Bei Yu, Qiaoyi Liu

This study aims to explore ways for keeping KOS current with advances in scientific research and to experiment the effectiveness of human-AI collaboration in detecting and identifying topical specific semantic gaps in MeSH as a proof of concept. Using a dataset on public health policy and COVID-19 vaccines, the experiment found one-third papers have no author keywords and among those author keywords are available, there is a low proportion of overlaps between MeSH headings and author keywords. Further analysis identified structural mismatches and conceptual gaps between MeSH and author keywords and discussed the factors that contributed to these problems. The use of AI tools in this preliminary study provides useful insights for future larger scale KOS evaluation.

Jian Qin

Professor

Syracuse University
ORCID

WebPage
Jian Qin is Professor of the iSchool at Syracuse University and currently serves as the Director for Dublin Core Academy. She conducts research in metadata, knowledge organization and representation, data and knowledge modeling, ontologies, research collaboration networks, research impact assessment, and data curation. Her research has received funding from U.S. National Science Foundation, U.S. National Institutes for Health, and U.S. Institute for Museum and Library Services. She was the recipient of the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology.

Establishing Ethical Guidelines for Use of AI Tools in KOS and Metadata Services: A Framework of Human-Supervised Automation Workflows

This panel brings together metadata researchers and practitioners internationally to facilitate the development of ethical guidelines for the use of AI as a tool for KOS mapping methods and practical experience. Panelists have established their research programs in knowledge organization, digital humanities and information retrieval, with ongoing research projects on the use of AI tools for subject indexing in KOS and metadata services. Key research finding reveals that librarians feel confident in their professional identity but lack the practical training to perform AI-related operational tasks for technical services. A framework that prioritizes the information professionals as the "human-in-the-loop" is intended to ensure that AI tools should be applied, with measurable benefit to users without compromising the professional standards.

Ying-Hsang Liu

Researcher

Professorship of Predictive Analytics, Chemnitz University of Technology
ORCID

WebPage

LinkedIn
Ying-Hsang Liu is a researcher at Chemnitz University of Technology (Germany) in Predictive Analytics. With a Ph.D. in Information Science (Rutgers University, USA), he has held academic positions across five countries. His research focuses on human-centered data science, information retrieval, and AI-based systems, supported by grants from the ARC, ARDC, and Airbus. Dr. Liu has authored 65 peer-reviewed publications and two books, serves on ASIS&T and ALISE committees, and is a Distinguished Member of ASIS&T 2022.

Shu-Jiun (Sophy) Chen

Associate Research Fellow

Institute of History and Philology, Academia Sinica
ORCID

WebPage

LinkedIn
Shu-Jiun (Sophy) Chen is an Associate Research Fellow at the Institute of History and Philology and the Center for Digital Cultures, Academia Sinica, and an Adjunct Associate Professor in the Department of Library and Information Science, National Taiwan University (NTU). She holds an MA from the University of Sheffield and a PhD from NTU. Her work spans cultural heritage informatics, digital humanities, knowledge organization, metadata, linked data, and digital curation. She initiated the Chinese AAT-Taiwan project and established Academia Sinica’s Linked Open Data Lab.

Seungmin Lee

Professor

Chung-Ang University, Seoul, South Korea
ORCID
Seungmin Lee is a professor in the Department of Library and Information Science at Chung-Ang University, South Korea. He has served as Chair of the Cataloging Committee, Chair of the Librarian Certification Committee, and Chair of the Planning and Policy Committee of the Korean Library Association (KLA). He is currently Vice President of the Korean Biblia Society for Library and Information Science and Editor-in-Chief of the Journal of the Korean Library and Information Science Society. His research interests include metadata, bibliographic ontology, and knowledge organization.

Charlene Chou

Head of Knowledge Access Department

New York University, Division of Libraries
ORCID
Charlene Chou is the Head of the Knowledge Access Department at New York University Libraries, where she oversees cataloging and metadata services. She has contributed to national and international metadata standards through active service on various committees, including the PCC (Program for Cooperative Cataloging) Policy Committee and the Joint RDA Board and RSC Working Group on Artificial Intelligence. She is committed to leading pilot projects on emerging trends and technologies.

Junzhi Jia

Professor

Renmin University of China
ORCID
Dr. Junzhi Jia is an professor at the School of Information Resource Management, Renmin University of China. Her research focuses on information organization, ontologies, and metadataband AI literacy. Member of the metadata and AI task group of the DCMI Education Committee. She has published over 170 academic papers and two monographs: Ontology Construction for Chinese FrameNet Ontology and Semantic Association and Aggregation of Chinese Name Authority Records.

From Prompt Logs to Traceable Governance: A Provenance Application Profile for GenAI-Assisted Multilingual Thesaurus Localization

GenAI-assisted multilingual thesaurus localization raises a governance problem: plausible labels and scope notes are not sufficient unless their generation conditions, source evidence, review, revision, and adoption can be traced. This ongoing study develops a provenance application profile for such workflows, using the Traditional Chinese localization of the Art & Architecture Thesaurus (AAT) as its empirical setting. The presentation focuses on two AAT-based diagnostic cases. One case concerns evidence-supported term alignment in the localization of an existing AAT concept. The other concerns local cultural vocabulary construction, where AI-assisted synonym discovery must be reviewed against historical relationships, expert judgment, and KOS equivalence boundaries. Together, the cases show that prompt logs alone cannot document accountable editorial work: a final preferred term, non-preferred term, rejected candidate, or scope note must be connected to the task context, evidence used, model and prompt conditions, review roles, and editorial decision rationale. The proposed Prov-AI application profile translates these governance needs into eighteen competency questions and a lightweight Core-L1 structure based on WorkItem, TaskRun, TurnRun, Artifact, Result, Agent, Association, and ContextPack. W3C PROV-O provides the provenance backbone, while SKOS supports KOS concepts and controlled values, and DCMI Metadata Terms support description, source, version, and rights information. Rather than replacing existing standards, the profile reuses and constrains them for GenAI-assisted multilingual KOS workflows. The conference version demonstrates how selected competency questions can be answered through queryable records and minimum Core-L1 checks. The study reframes GenAI-assisted thesaurus localization as a human-supervised, evidence-based governance process and provides a basis for reproducibility, accountability, and interoperable documentation of AI-assisted KOS maintenance.

Shu-Jiun (Sophy) Chen

Associate Research Fellow

Institute of History and Philology, Academia Sinica
ORCID

WebPage

LinkedIn
Shu-Jiun (Sophy) Chen is an Associate Research Fellow at the Institute of History and Philology and the Center for Digital Cultures, Academia Sinica, and an Adjunct Associate Professor in the Department of Library and Information Science, National Taiwan University (NTU). She holds an MA from the University of Sheffield and a PhD from NTU. Her work spans cultural heritage informatics, digital humanities, knowledge organization, metadata, linked data, and digital curation. She initiated the Chinese AAT-Taiwan project and established Academia Sinica’s Linked Open Data Lab.

Hermeneutic Ontology Engineering: LLM-Assisted Schema Induction for Oral History Knowledge Organisation

Authors: Jiajie Zhang, Andreas Vlachidis, Julianne Nyhan

This paper presents Hermeneutic Ontology Engineering, a methodology that uses Large Language Models not only to populate ontologies from unstructured, subjective corpora but to diagnose ontological inadequacy through structural analysis of extraction results. Developed within the Mixed-Methods Digital Oral History (MeDoraH) project, the approach separates oral testimony into a Claims Layer of reified assertions preserving epistemic stance and speaker attribution, and a Reference Layer of curated entity types, refined through three extraction passes (schema-free, schema-guided, and research-question-driven) whose divergences expose specific schema gaps. Applied to 19 oral history interviews documenting the formation of Digital Humanities (1,647 claims across 176 narrative units), graph-structural analysis revealed category conflations, cross-typing ambiguities, and "inferential hallucination", where LLMs convert hedged testimony into definitive relational statements. The resulting ontology (seven top-level classes, informally aligned with CIDOC-CRM, FOAF, and Dublin Core) and an open-source workbench demonstrate how LLM extraction can serve as an active instrument of schema evaluation rather than a downstream consumer of fixed schemas.

Jiajie Zhang

Research Fellow in Semantic Web Technologies and Information Extraction

University College London
ORCID

WebPage
Jiajie is a Research Associate in Semantic Web Technologies and Information Extraction at UCL, currently working on the MeDoraH Project to advance digital methodologies in oral history. His research focuses on knowledge graphs, NLP, large language models, and explainable AI for scalable, interdisciplinary information retrieval. He holds a PhD from Newcastle University, where he developed ontological frameworks for analysing research impact. Alongside his research, Jiajie teaches NLP at UCL and has a background in teaching big data analytics and software development.

Semantic Mapping of Archival Metadata Standards: Toward a Hybrid Knowledge Organization System for the National Library and Archives of Iran

Authors: Fatemeh Pazooki, Esmail Babaei Dehkordi

This study addresses the fragmented semantic landscape of archival metadata by mapping and aligning four key standards (Records in Contexts, Dublin Core, PREMIS, and Schema.org), treating each as a knowledge organization system whose entities, properties, and relationships can be systematically aligned. Using the National Library and Archives of Iran (NLAI) as a case study, the mixed-methods design combines archival process analysis, semantic analysis of each standard, a comparative matrix mapping archival processes to metadata elements, and ontology alignment supported by AI-assisted similarity matching and relationship inference. The outcome is a hybrid KOS model, expressed in ontology-based representations compatible with Linked Data and RDF, that integrates complementary strengths: RiC-CM's contextual and relational depth, Dublin Core's lightweight discovery semantics, PREMIS's preservation coverage, and Schema.org's web visibility. Evaluated through expert review and pilot application to an NLAI archival dataset, the model demonstrates that no single standard supports the full archival lifecycle alone, and offers a culturally grounded yet globally interoperable approach to semantic interoperability, contextual navigation, and semantic search for national archival institutions.

Esmail Babaei Dehkordi

Member of Planning Department, National Library and Archives of Iran.

PhD Candidate in Library and Information Science, Kharazmi University, Tehran, Iran
ORCID

LinkedIn
Archives of Iran. Esmail Babaei Dehkordi is a PhD candidate in Library and Information Science, blending academic rigor with practical expertise. He currently serves as the Statistician and Planning Specialist at the National Library and Archives of Iran, where he leverages data-driven insights to inform strategic decision-making and optimize library services.

NKOS Workshop

Beyond Keywords: Retrieving Blue-and-White Ceramics in Dutch Paintings with Knowledge-Augmented CLIP

Yang Zhao

Capturing Semantic Gaps in MeSH through Human-AI Collaboration

Jian Qin

Establishing Ethical Guidelines for Use of AI Tools in KOS and Metadata Services: A Framework of Human-Supervised Automation Workflows

Ying-Hsang Liu

Shu-Jiun (Sophy) Chen

Seungmin Lee

Charlene Chou

Junzhi Jia

From Prompt Logs to Traceable Governance: A Provenance Application Profile for GenAI-Assisted Multilingual Thesaurus Localization

Shu-Jiun (Sophy) Chen

Hermeneutic Ontology Engineering: LLM-Assisted Schema Induction for Oral History Knowledge Organisation

Jiajie Zhang

Semantic Mapping of Archival Metadata Standards: Toward a Hybrid Knowledge Organization System for the National Library and Archives of Iran

Esmail Babaei Dehkordi