NKOS Workshop

The programme is still being finalized and is subject to ongoing updates as sessions are scheduled. Please check back regularly for the latest changes.

Beyond Keywords: Retrieving Blue-and-White Ceramics in Dutch Paintings with Knowledge-Augmented CLIP

Authors: Yang Zhao

Museum catalogue metadata describes a painting's title, artist, date, and medium, but rarely the specific objects depicted within it. This is a structural limitation of text-based knowledge organization systems applied to visual collections. We study this problem through blue-and-white ceramics in Dutch Golden Age paintings from the Rijksmuseum open collection, where both expert researchers and general visitors struggle to find relevant works through keyword search. We compare three retrieval approaches over a corpus of 150 paintings with 28 ground-truth positives. Metadata search with expert terms reaches perfect precision but only 10.7% recall. Zero-shot CLIP performs comparably to general metadata search. A knowledge-augmented approach, which expands the query with annotations retrieved from a small curated reference set of 20 annotated paintings, doubles both metrics to 60.0% precision and 42.9% recall. An ablation study shows that annotation type matters more than quantity: short scene-context descriptions outperform object-feature descriptions drawn from ceramic photographs. This confirms that knowledge organization principles still apply in embedding space. A small, carefully designed annotation layer paired with a general-purpose vision-language model can outperform the full metadata apparatus for retrieving specific depicted content.
  • Yang Zhao

    Syracuse University

    Yang (pronounced “yahng,” rhymes with “song”) is a first-year Ph.D. student in Information Science and Technology at Syracuse University. Her research interests include digital humanities and information accessibility, with a focus on making visual cultural materials more open and usable for diverse audiences, especially people without technical backgrounds and those with different access needs.

Capturing Semantic Gaps in MeSH through Human-AI Collaboration

Authors: Jian Qin, Bei Yu, Qiaoyi Liu

This study aims to explore ways for keeping KOS current with advances in scientific research and to experiment the effectiveness of human-AI collaboration in detecting and identifying topical specific semantic gaps in MeSH as a proof of concept. Using a dataset on public health policy and COVID-19 vaccines, the experiment found one-third papers have no author keywords and among those author keywords are available, there is a low proportion of overlaps between MeSH headings and author keywords. Further analysis identified structural mismatches and conceptual gaps between MeSH and author keywords and discussed the factors that contributed to these problems. The use of AI tools in this preliminary study provides useful insights for future larger scale KOS evaluation.
  • Jian Qin

    Syracuse University

    Jian Qin is Professor of the iSchool at Syracuse University and currently serves as the Director for Dublin Core Academy. She conducts research in metadata, knowledge organization and representation, data and knowledge modeling, ontologies, research collaboration networks, research impact assessment, and data curation. Her research has received funding from U.S. National Science Foundation, U.S. National Institutes for Health, and U.S. Institute for Museum and Library Services. She was the recipient of the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology.

Establishing Ethical Guidelines for Use of AI Tools in KOS and Metadata Services: A Framework of Human-Supervised Automation Workflows

This panel brings together metadata researchers and practitioners internationally to facilitate the development of ethical guidelines for the use of AI as a tool for KOS mapping methods and practical experience. Panelists have established their research programs in knowledge organization, digital humanities and information retrieval, with ongoing research projects on the use of AI tools for subject indexing in KOS and metadata services. Key research finding reveals that librarians feel confident in their professional identity but lack the practical training to perform AI-related operational tasks for technical services. A framework that prioritizes the information professionals as the "human-in-the-loop" is intended to ensure that AI tools should be applied, with measurable benefit to users without compromising the professional standards.
  • Ying-Hsang Liu

    Professorship of Predictive Analytics, Chemnitz University of Technology

    Ying-Hsang Liu is a researcher at Chemnitz University of Technology (Germany) in Predictive Analytics. With a Ph.D. in Information Science (Rutgers University, USA), he has held academic positions across five countries. His research focuses on human-centered data science, information retrieval, and AI-based systems, supported by grants from the ARC, ARDC, and Airbus. Dr. Liu has authored 65 peer-reviewed publications and two books, serves on ASIS&T and ALISE committees, and is a Distinguished Member of ASIS&T 2022.
  • Shu-Jiun (Sophy) Chen

    Institute of History and Philology, Academia Sinica

    Shu-Jiun (Sophy) Chen is an Associate Research Fellow at the Institute of History and Philology and the Center for Digital Cultures, Academia Sinica, and an Adjunct Associate Professor in the Department of Library and Information Science, National Taiwan University (NTU). She holds an MA from the University of Sheffield and a PhD from NTU. Her work spans cultural heritage informatics, digital humanities, knowledge organization, metadata, linked data, and digital curation. She initiated the Chinese AAT-Taiwan project and established Academia Sinica’s Linked Open Data Lab.
  • Seungmin Lee

    Chung-Ang University, Seoul, South Korea

    Seungmin Lee is a professor in the Department of Library and Information Science at Chung-Ang University, South Korea. He has served as Chair of the Cataloging Committee, Chair of the Librarian Certification Committee, and Chair of the Planning and Policy Committee of the Korean Library Association (KLA). He is currently Vice President of the Korean Biblia Society for Library and Information Science and Editor-in-Chief of the Journal of the Korean Library and Information Science Society. His research interests include metadata, bibliographic ontology, and knowledge organization.
  • Charlene Chou

    New York University, Division of Libraries

    Charlene Chou is the Head of the Knowledge Access Department at New York University Libraries, where she oversees cataloging and metadata services. She has contributed to national and international metadata standards through active service on various committees, including the PCC (Program for Cooperative Cataloging) Policy Committee and the Joint RDA Board and RSC Working Group on Artificial Intelligence. She is committed to leading pilot projects on emerging trends and technologies.

Hermeneutic Ontology Engineering: LLM-Assisted Schema Induction for Oral History Knowledge Organisation

Authors: Jiajie Zhang, Andreas Vlachidis, Julianne Nyhan

1. Aims This paper presents Hermeneutic Ontology Engineering, a methodology that uses Large Language Models not only to populate ontologies from unstructured, subjective corpora but to diagnose ontological inadequacy through structural analysis of extraction results. Developed within the Mixed-Methods Digital Oral History (MeDoraH) project, we applied Hermeneutic Ontology Engineering to oral history interviews documenting the formation of Digital Humanities. We report concrete cases in which LLM-driven extraction, followed by graph-structural analysis, exposed category conflations that would have been invisible through top-down design alone, and we present the resulting ontology as an empirically grounded product of this iterative process. 2. Background Established knowledge organisation systems for cultural heritage, notably CIDOC-CRM (Doerr, 2003) with its argumentation extension CRMinf (Doerr, et al., 2023), and foundational ontologies such as DOLCE (Gangemi et al., 2002), model events, provenance, and epistemic uncertainty. However, these frameworks were designed for curatorial and archival contexts where epistemic qualification applies to metadata assertions about documented events or artefacts. Oral history presents a different analytical situation: the testimony itself is the primary object of study, and it is situated, intersubjective, and performative, a record of how individuals construct meaning from the past, not a report of what occurred (Portelli, 1991). Speakers contradict themselves, hedge, and embed factual claims within narrative stances. The challenge is not that existing KOS lack uncertainty mechanisms, but that their event-centric core does not foreground the epistemic and narrative dimensions that make oral testimony analytically distinctive. Recent work has begun exploring LLMs for oral history: thematic classification (Cherukuri et al., 2025), automatic subject indexing against controlled vocabularies (Widegren, 2025), and ontological modelling of oral history metadata interoperability (Vrachliotou and Papatheodorou, 2024). These approaches consume existing schemas rather than questioning them. Our work inverts this relationship: we use LLM-driven extraction to iteratively stress-test the schema itself, treating ontological categories as interpretive hypotheses subject to empirical revision, an approach grounded in the hermeneutic principle that understanding proceeds through iterative encounters between pre-understanding (the schema) and the text, each encounter revising both. 3. Methodology Hermeneutic Ontology Engineering operates through three components: a representational architecture that separates analytical concerns, an extraction workflow that generates diagnostic evidence, and a workbench that supports human evaluation of schema adequacy. The representational architecture addresses a core problem: oral history claims must be queryable as structured data without collapsing the epistemic context that gives them meaning. We separate this into a Claims Layer, where discourse is atomised into individual propositions modelled as reified assertions (via RDF-star) carrying epistemic stance, certainty markers, and speaker attribution; and a Reference Layer, which provides the entity-centric schema—curated, typed identifiers for actors, organisations, technologies, events, and concepts serving as stable anchors for cross-corpus alignment. Two supporting layers handle discourse structure and narrative segmentation, maintaining provenance chains from every assertion back to a specific passage, narrator, and interview context. The hermeneutic refinement workflow processes bounded narrative units through three structurally distinct passes: (1) schema-free open extraction that discovers entity types and relations without predetermined categories; (2) schema-guided extraction, where the current ontology is injected into the prompt to test whether existing categories adequately capture the discourse; and (3) research-question-driven extraction targeting domain-specific analytical concerns (e.g. knowledge mobility, institutional formation). Crucially, these passes are not redundant: they are designed to diverge when the schema is inadequate. Convergence across passes increases confidence in a mapping; divergence provides specific, locatable evidence of schema gaps. The interactive workbench (a web-based application) supports human-in-the-loop evaluation: provenance-linked claim inspection, structural pattern analytics, and structure-aware clustering that groups semantically similar predicates into candidate relation templates for researcher evaluation. 4. Findings We applied Hermeneutic Ontology Engineering to 19 English-language interviews from the Hidden Histories corpus, extracting 176 narrative units containing 1,647 reified claims. Schema-free extraction produced 160 raw entity types, a 7:1 ratio against the final schema's 22 classes, indicating substantial over-specification driven by surface lexical variation. Graph-structural analysis of extraction results functioned as the primary diagnostic instrument, exposing three categories of schema inadequacy: Category conflation. The most instructive case involved the raw type "Concept," which appeared 951 times across bottom up extractions but occupied three systematically different structural positions in the preliminary knowledge graph: (1) as claim-internal stance markers, appearing only in predicate or qualifier positions; (2) as relational objects in knowledge-creation triples (e.g. theories, paradigms, methodologies); and (3) as disciplinary labels classifying actors and organisations. These positional signatures, invisible in frequency counts alone, provided empirical grounds for decomposing "Concept" into three distinct ontology classes: epistemic stance (modelled in the Claims Layer rather than the Reference Layer), Conceptual Framework, Methodology and Discipline. Cross-typing ambiguity. Entities such as TEI appeared simultaneously as technology artefacts (standards, software) and as conceptual frameworks (methodological paradigms) in different claims by different speakers. Rather than forcing a single classification, the schema separates Artefact from ConceptualItem and connects them via bridging relations (implementsConcept, about), allowing the same referent to be typed differently in different assertional contexts. Inferential Hallucination Experiments reveal that some LLMs systematically convert hedged or implicit testimony into definitive relational statements, a phenomenon we term 'Inferential Hallucination'. This confirmed the necessity of the Claims Layer's reified assertion model, where epistemic stance is preserved as a first-class property rather than collapsed during extraction. Schema-guided extraction, operating with the ontology's typed relations, consistently missed relations that schema-free extraction captured when the discourse used vocabulary outside the schema's semantic range, particularly around informal mentorship, funding politics, and disciplinary boundary-crossing. These divergences directly informed the addition of relation sub-properties (e.g. influences as distinct from mentorsOrSupervises). The resulting ontology defines seven top-level classes (Actor, Event, Artefact, Conceptual Item, Spatial Entity, Temporal Entity, Property) with 15 subclasses, connected by 11 top-level relation patterns with controlled sub-properties. Each class documents informal alignment correspondences with CIDOC-CRM, FOAF, and Dublin Core. Formal mappings (rdfs:subClassOf, owl:equivalentClass) are deliberately deferred: premature formalisation would constrain the iterative refinement that is the methodology's core contribution. We intend to stabilise formal alignments after the schema has been tested against the full multilingual corpus. 5. Significance This work contributes to NKOS discussions on two fronts. Methodologically, it demonstrates that graph-structural position of extracted entities, specifically, whether a type appears as subject, object, qualifier, or bridging node, and how its connectivity profile differs across relational contexts, provides an empirical diagnostic for ontological adequacy that complements traditional competency-question evaluation. When a category label maps to structurally divergent graph positions, that divergence constitutes evidence for ontological refinement. This reframes LLM extraction from a downstream consumer of fixed schemas into an active instrument of schema evaluation. Practically, the open-source workbench enables domain specialists to participate in schema construction by inspecting how extraction results populate/fail to populate the ontological categories, lowering the barrier between KOS designers and knowledge communities. The ontology and workbench source code are available at https://github.com/articoder/medorah_ontology and https://github.com/articoder/medorah_nlp.
  • Jiajie Zhang

    University College London

    Jiajie is a Research Associate in Semantic Web Technologies and Information Extraction at UCL, currently working on the MeDoraH Project to advance digital methodologies in oral history. His research focuses on knowledge graphs, NLP, large language models, and explainable AI for scalable, interdisciplinary information retrieval. He holds a PhD from Newcastle University, where he developed ontological frameworks for analysing research impact. Alongside his research, Jiajie teaches NLP at UCL and has a background in teaching big data analytics and software development.