NKOS Workshop

This session includes a 60-minute lunch break from 13:00 to 14:00.
Starts at
Sun, Oct 20, 2024, 10:00 EDT
( 20 Oct 24 14:00 UTC )
Finishes at
Sun, Oct 20, 2024, 17:00 EDT
( 20 Oct 24 21:00 UTC )
Venue
BL 205
Moderator
Joseph Busch

Artificial intelligence (AI) is broadly defined as the use of automation to solve problems by reasoning autonomously. Today, the popular AI method is large language models (LLMs). But there are many other automation methods, such as rules-based, machine learning, vectors, n-grams, clustering, filtering, NLP (natural language processing), NLG (natural language generation), etc., that can make automation intelligent. While there is a tendency to focus on one primary method, most AI applications use several methods.

The NKOS Workshop is particularly interested in how knowledge organization systems (KOS) are being used or can be used to make automation intelligent. For example, one problem with LLMs is “hallucinations,” where the application generates a response to a prompt that is “correct” but not true. How can KOS be integrated with LLMs to guide their responses so that they do not produce “hallucinations”?

Time Content
10:00 AM – 10:15 AM NKOS 2024 Workshop - Introduction (Joseph Busch)
10:15 AM – 11:45 AM Session 1: LLM Demonstrations
Leveraging Generative AI for Multilingual Thesaurus Development: Insights from the Confucius Ceremony Cultural Vocabulary (Sophy Shu-Jiun Chen)
The Ontology Enhanced Multimodal Large Language Models for Knowledge Organization and Representation of Multimodal Cultural Memory Resources (Cuijuan Xia)
Using LLMs for Enriching Metadata with Links to KOS and Knowledge Graphs: Case Finnish Named Entity Linking (Rafael Leal, Annastiina Ahola, & Eero Hyvönen)
11:45 AM – 12:00 PM Break
12:00 PM – 1:00 PM Session 2: Knowledge Graph Demonstrations
The African Knowledge Hub: Role of the SDG Taxonomy (KOS) in Harnessing, Exploring, and Navigating Knowledge of the United Nations System - Africa (Irene Onyancha, Ahmed Al-Awah, Fraser Gordon, & Rofaida Elzubair)
Patent Citation Link Prediction Based on Graph Neural Network (Wei Hu, Shuying Li, & Ning Yang)
1:00 PM – 2:00 PM Lunch
2:00 PM – 3:30 PM Session 3: ML Demonstrations
Exploring Patient Perspectives on Anticipating and Mitigating Potential Harms of LLMs in Depression Self-Management (Dong Whi Yoo & Koustuv Saha)
Testing Use of an LLM Algorithm on Gene Ontology (GO) Dataset (Qiaoyi (Joy) Liu & Jian Qin)
Using KOS AI to Expedite the Conference Experience (Margie Hlava)
3:30 PM – 3:45 PM Break
3:45 PM – 4:45 PM Session 4: AI and KOS Demonstrations
Evaluating AI Assignment of Library of Congress Subject Headings (LCSH) (Brian Dobreski & Chris Hastings)
Mapping the Global AI Landscape Using AI and KOS (Trevor Watkins)
4:45 PM – 5:00 PM Announcements & Closing (Joseph Busch)

Moderator

  • Joseph Busch

    Taxonomy Strategies

    Mr. Busch is an authority in the field of information science, with an emphasis on helping organizations develop metadata frameworks and taxonomy strategies to ensure that content realizes its highest value through re-use and re-purposing. He has extensive knowledge and experience developing content architectures consisting of metadata frameworks, taxonomies and other information management methods to implement effective applications. He is currently on a full-time assignment as the senior business classification analyst for the African Development Bank which is based in Abidjan in the Côte d’Ivoire.

Presentations

Exploring Patient Perspectives on Anticipating and Mitigating Potential Harms of LLMs in Depression Self-Management

Large Language Models (LLMs), such as ChatGPT, are increasingly being integrated into healthcare, and their application in mental health, particularly in managing depression, presents both potential benefits and challenges. This study investigates how LLM-based chatbots can empower users by providing instant, personalized support while addressing the need for robust safety mechanisms in sensitive mental health contexts. Participants will engage in one-hour remote interviews, interacting with a ChatGPT API-powered chatbot focused on the self-management of depression. Reflexive thematic analysis will be used to identify themes related to user perceptions and potential harms. Anticipated outcomes include insights into the effectiveness of chatbots in managing depression, potential harms, and design implications for safe and effective LLMs for depression self- management. The findings aim to enhance knowledge organization systems within LLMs by improving the structuring and access of mental health information. Preliminary results will be presented at the workshop, showcasing the data collection webpage developed using the ChatGPT API.
  • Dong Whi Yoo

    Kent State University

    Dong Whi Yoo is a researcher and media artist who studies the interaction between society and emerging technologies. As a human-computer interaction (HCI) researcher, he explores design implications for emerging technologies, particularly for marginalized and underrepresented groups such as people with mental health disorders. Over the past few years, he has worked with individuals with schizophrenia to understand their underrepresentation in AI development and to design predictive algorithms that support their work practices. He investigates how people with psychotic disorders make sense of their symptoms and build their identities. His studies have been published in leading HCI and digital mental health venues, including CHI, CSCW, PervasiveHealth, JMIR and Internet Interventions.

Using Gene Ontology and ML Algorithms for Dataset Design and Creation for ML/AI Modeling

Authors: Qiaoyi Liu, Jian Qin

This demo proposal presents a case study that uses Gene Ontology (GO) and ML/AI algorithms to design and create KO-derived datasets for ML/AI applications. We discuss the characteristics and requirements of KO practices and products in implementing ML algorithms. The focus of this demo is on how knowledge organization systems can be utilized to derive datasets that can deliver quality and trustworthiness for achieving the precision, computing of semantic similarity, and interoperability in these algorithms.
  • Qiaoyi Liu

    Syracuse University

    I’m a PhD student studying Information Science and Technology at Syracuse University School of Information Studies. I'm also a member of the Metadata Lab. I have a MS degree in Library and Information Science (SU, G’23) and a BS degree in Biological Sciences (CNU, G’19). My research interests are Science of Science (SoS), Knowledge Organization Systems (KOS).
  • Jian Qin

    Syracuse University

    Jian Qin is Professor of the iSchool at Syracuse University. She conducts research in metadata, knowledge modeling and representation, ontologies, research collaboration networks, research impact assessment, and data curation. Jian Qin directs a Metadata Lab, a research group focusing on big metadata analytics and knowledge modeling. Her research has received funding from US NSF, NIH, IMLS, among others. She publishes widely with more than 100 journal and conference papers in the field of information science, scientometrics, knowledge organization, and metadata and been invited to give keynotes, lectures, and presentations at conferences and institutions inside and outside of the U.S. She is the co-author of the book Metadata and co-editor for several special journal issues on knowledge discovery in databases and knowledge representation. Jian Qin has served as the DCMI conference program chair and track chair and as the member/chair of numerous other conference program committees, including ASIST, iConference, JCDL, among others. She received the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology. Jian Qin holds a Ph.D. from University of Illinois at Urbana-Champaign. Further information can be found at https://ischool.syr.edu/jian-qin/. A complete copy of CV can be found from https://jianqin.metadataetc.org/wp-content/uploads/2023/08/Qin_CV.pdf.

Evaluating AI Assignment of Library of Congress Subject Headings (LCSH)

Authors: Brian Dobreski, Christopher Hastings

As with many areas of research and practice, the cultural heritage domain has shown increasing interest in the use of AI in recent years, with cultural heritage institutions such as libraries, archives, and museums actively exploring the use of AI tools in their workflows. Large language model (LLM)-based text applications including ChatGPT have been touted as holding great promise for cultural heritage work. One of the most challenging parts of producing library metadata specifically may be subject cataloging: the assignment of subject headings and classification numbers. This task requires cataloger fluency in the formal and often complex knowledge organization systems (KOS) used to represent aboutness and genre in bibliographic records. The work presented here is part of a larger, ongoing research project assessing the effectiveness of AI tools for performing subject analysis and representation tasks for cultural heritage data. In this presentation, researchers offer the results of a structured test of freely available AI tools to assign headings from Library of Congress Subject Headings (LCSH) to library materials. The findings add further empirical evidence into current discussions concerning the quality and reliability of AI-performed metadata work, and, more broadly, contribute to the growing discourse around the use of AI in applying KOS.
  • Brian Dobreski

    University of Tennessee, Knoxville

    Brian Dobreski is an Assistant Professor in the School of Information Sciences at University of Tennessee-Knoxville. His research focuses on the practices and implications of knowledge and information organization, as well as the concepts of personhood and personal identity in information. Brian received his Ph.D. in information science from Syracuse University. He has authored works in publications including Journal of Documentation, Knowledge Organization, Cataloging & Classification Quarterly, Social Media + Society, Journal of Information Ethics, and Journal of Education for Library and Information Science.
  • Christopher Hastings

    University of Tennessee, Knoxville

    Christopher Hastings holds a B.A. in History from the University of California, San Diego. During
    and after his undergraduate studies he worked as a manuscript processor in the UCSD Special
    Collections and Archives. Currently, he is attending the University of Tennessee, Knoxville,
    pursuing a M.S. in Information Science. During his MSIS studies he assisted Dr. Brian Dobreski
    with research on the use of Artificial Intelligence for library cataloging. Hastings is involved with
    the Polar Libraries Colloquy and presented his own research on the mammoth ivory trade in
    Siberia at the 29th colloquy in Tromsø, Norway in June 2024.

Patent citation link prediction based on graph neural network

Authors: Wei Hu,Shuying Li,Ning Yang

Patent citation relationships constitute a citation network, and the predictability of edges in a network is a frontier research issue in complex networks. This article explores the prediction model of patent citation relationships. By integrating patent technical text content and classification code features, a graph neural network is trained for patent citation link prediction. These aim to provide methodology support for technology knowledge diffusion and patent data management. This study collects patent data in the field of quantum sensing, constructs a network based on patent citation relationships, and extracts text features such as technical problems, solutions, functions, and effects. This article proposes a new link prediction model framework based on graph neural networks, taking into account the characteristics of natural language in patent documents. Addressing the characteristics of natural language in patent literature, this article proposes a new model framework for link prediction based on graph neural networks.

In terms of model framework, we initially employ the GraphSAGE model on the training citation network to obtain the embedding vectors of patent nodes. Then, the semantic vectors of patent technical text are derived by pre-trained models such as PatentBERT. These two sets of vectors are then integrated and fed in a Random Forest model. Ultimately, we derive the predicted probability values for patent citation link prediction. Furthermore, in terms of interpretability, this study constructs a decision tree model based on the integrated results of the two sets of vectors. This model effectively measures the impact of multidimensional technical text content, local network structure, individual heterogeneity, and other factors on network edge formation.

  • Wei Hu

    National Science Library (Chengdu), Chinese Academy of Sciences

    Dr. Wei Hu is an Assistant Research Fellow at the National Science Library (Chengdu) within the Chinese Academy of Sciences. He obtained his Ph.D. in Statistics from the School of Statistics at Renmin University of China. Dr. Hu's research interests encompass complex network modeling, link prediction, text mining, and knowledge organization. His work has been featured in esteemed journals such as Computational Statistics & Data Analysis, Electronic Journal of Statistics, and Data Analysis and Knowledge Discovery.
  • Ning Yang

    National Science Library (Chengdu), Chinese Academy of Sciences

    Yang Ning is a Senior Engineer at the National Science Library (Chengdu), Chinese Academy of Sciences. He has been selected as a Distinguished Research Fellow at CAS. He currently serves as the Deputy Director of the Knowledge Systems Department, as well as the Deputy Director of the Sichuan Province Engineering Research Center for Intelligent Mining and Application of Scientific and Technological Information. He obtained his PhD degree in Management at the University of the Chinese Academy of Sciences and was a Visiting Scholar at the School of Information at Kent State University in the United States. He has long been engaged in research in the fields of information organization and utilization, knowledge mining and services, and scientific data management and application. He has led one project funded by the National Social Science Fund, published over 20 papers in core journals and academic conferences such as Scientometrics and Library and Information Service, co-authored two books, holds two authorized invention patents, and has three software copyrights. He also serves as a peer reviewer for multiple journals and conferences.

Leveraging Generative AI for Multilingual Thesaurus Development: Insights from the Confucius Ceremony Cultural Vocabulary

Generative artificial intelligence (GAI), particularly those based on large language models (LLMs), has become an increasingly important tool in digital humanities. It enhances research efficiency in tasks such as content analysis, keyword extraction, automated metadata creation, and data management, uncovering previously difficult-to-observe phenomena and tackling challenging issues. Beyond data generation, GAI’s rapid content analysis and knowledge structure design capabilities offer new exploratory directions for constructing and designing thesauri based on Knowledge Organization Systems (KOS). Using the multilingual "Art & Architecture Thesaurus" (AAT) developed by the Getty Research Institute (GRI) as an example, the Academia Sinica Center for Digital Cultures (ASCDC) has collaborated with GRI for over a decade to address the inadequacies of localized cultural vocabulary. The Chinese language and concepts of material culture are converted into English and integrated into the AAT through translation and mapping. During this process, the conceptual structure of controlled vocabularies in Chinese and English terms presents multiple alignment patterns, and a systematic methodology has been developed to support editorial work. This study aims to explore how GAI can assist in constructing a structured thesaurus based on the cultural conceptual vocabulary related to the Confucius Ceremony, with the goal of contributing this localized vocabulary to AAT.
  • Sophy Shu-Jiun Chen

    Academia Sinica

    Sophy Shu-Jiun Chen, Associate Research Fellow at Academia Sinica’s Institute of History and Philology, also serves as Executive Secretary of the Academia Sinica Center for Digital Cultures. She holds an M.A. in Information Studies from the University of Sheffield, UK, and a Ph.D. in Library and Information Science from National Taiwan University. Her research spans cultural heritage informatics, digital libraries, digital humanities, knowledge organization, and linked data. She initiated the Chinese AAT Taiwan project and established the Linked Open Data Lab at Academia Sinica.

The Ontology enhanced multimodal large language models for the Knowledge Organization and Representation of multi-modal cultural memory resources

The development of multimodal large language models(MLLMs) provides new solutions for knowledge organization and representation of multi-modal cultural memory resources. However, for the Knowledge Organization and Representation of some special cultural memory resources such as text, images, audio, and video resources related to the Guqin Subtractive Character Notation(see fig.1 and fig.2), the existing MLLMs need further optimization to achieve the expected results. Guqin Subtractive Character Notation is a distinct notation system rich in Chinese cultural significance, differing from both simplified notation and traditional staff notation.It shows the fingering techniques for Playing Guqin.It is not a Chinese character and can be recognized only by a very small number of professionals who have undergone long-term training.It cannot be recorgnized with the existing OCR technologies.
This study will use multi-modal Guqin Subtractive Character Notation resources as training data(see tab.1), and combine with Guqin ontology application profile and RDF data as prompt tuning data to explore a vertical application path of a MLLMs in the field of cultural heritage, and develop a prototype system to display the research results.The screen recording presented preliminary research results. By using the multimodal resources and Guqin ontology with RDF data as instruction fine-tuning data to fine tune the multimodal large language model, the cross modal retrieval with images and audio as input query can be achieved. The ultimate goal of this study is to use the optimized MLLMs to help more people understand the Guqin Subtractive Character Notation, especially those in the large collection of ancient books in the libraries.
  • Cuijuan XIA

    Shanghai Library

    Cuijuan(Jada) Xia is Researcher of Shanghai Library, team leader of Shanghai Library's Digital Humanities(DH) projects,senior DH Platform architect and KOS(knowledge organization system) designer. She has taken a mainly part in develop and design DH projects of Shanghai Library.She has collaborated with researchers engaged in digital humanities research in different fields of humanities.And She has participated in many research projects of different digital humanities research institutions. She hosts and participates in many national research projects. Her research focuses on Metadata, Ontology, Knowledge Organization, Linked Data, Digital Humanities, and Digital Memory. She has published 3 books and more than 90 papers in many academic journals. She is currently focusing on knowledge representation research of multimodal cultural memory resources for GenAI。E-mail: [email protected].

Using LLMs for Enriching Metadata with Links to KOS and Knowledge Graphs: Case Finnish Named Entity Linking

Authors: Rafael Leal, Annastiina Ahola, and Eero Hyvönen

This paper presents work on using Large Language Models (LLM) for disambiguating Named Entity Linking candidates, which is meant for enriching the metadata of textual documents by linking them to Knowledge Organization Systems, a.k.a domain ontologies, and Knowledge Graphs. We propose a zero-shot classification method that has similarities with Retrieval-Augmented Generation (RAG), and discuss an under-development prototype tool that allows for human intervention when making final disambiguation decisions, especially when this cannot be reliably carried out in automatic fashion. The focus of this work is on Finnish texts, so our methods must take into account the particularities of this language and the resources available for processing it.
  • Rafael Leal

    Aalto University, Department of Computer Science, Finland

    Rafael Leal's research interest lays on developing and using natural language processing technologies, such as large language models, for digital humanities research and applications.

Mapping the Global AI Landscape using AI and KOS

Artificial Intelligence (AI) is ubiquitous, shaping various facets of our global society. Understanding its impact and application across different domains is essential for members of the general public and community stakeholders in government, education, and industry. This demonstration will showcase a dashboard that leverages AI technologies and Knowledge Organization Systems (KOS) to track and analyze AI’s presence and impact worldwide. In this demo, I discuss how AI is applied in government, education, and industry, using the AI dashboard as a guide. I discuss what KOS we’ve used to create and power the dashboard, which is a tool to track AI’s ubiquity. Although still a work in progress, I will discuss how this dashboard uses web scraping tools to collect real-time data from sources such as news websites, government reports, academic publications, and industry reports, demonstrate how APIs are used to gather structured data, and the use of machine learning to normalize data from these sources to ensure consistency and accuracy through deduplication, standardization and validation of the data collected. Additionally, I will discuss some of the issues we've faced with copyright, restrictions, and limitations of API usage, and how we are dealing with bias mitigation. I will also talk about how ontologies define the relationships between AI and its subdomains, the technologies it powers, applications, and sectors. These ontologies provide a structured framework for organizing the data collected. The hierarchical classification within the ontologies makes it easier for users to navigate and understand the data, offering a clear view of the AI landscape. Controlled vocabularies ensure consistency in the terms used to describe AI technologies and applications across different data sources. This standardization enhances data integration and retrieval. A thesaurus captures synonyms and related terms to improve search capabilities within the dashboard. Taxonomies categorize AI and its impact, which allows users to filter and explore information based on specific criteria such as sector, region, and technology type.
  • Trevor Watkins

    George Mason University

    Trevor Watkins is the Teaching and Outreach Librarian at George Mason University. He leads a mini-team of two staff members on the Teaching and Learning Team, which engages in teaching, special projects, outreach, and library programming for George Mason University Libraries. His research interests include Artificial Intelligence (AI), AI literacy, Augmented Reality (AR), digital sustainability, and human-AI interaction. He is a professional member of IASSIST, IEEE, and ACM (SIGAI, SIGCSE). His projects include the Black Squirrel GNU/Linux operating system, Cosmology of Artificial Intelligence, Mason's 3D AR/VR Tour, and MOCA (Mason-Libraries Orientation Conversational Agent).

Using KOS AI to expedite the conference experience

Authors: Marjorie M.K. Hlava

Conferences provide a way to learn the most recent developments in a wide range of subject areas.  In particularly fast moving areas like cancer research or imaging and photonics there are literally 10’s of thousand papers submitted each year for consideration. The number of attendees can reach 10’s of thousands as well.  Every six to twelve months.  There was a strong need for automation.  The process involves peer review, categorizing into conference tracks and then matching the attendee to the sessions of most interest to them.  Using a custom topical taxonomy we demonstrate how the papers are channeled to approbate reviewers, then conference tracks based on top terms, and then suggested to the attendees based on their individual semantic profiles.  The two case studies show the process as well as the resulting mobile apps for use in the conference experience.
  • Marjorie M. K. Hlava

    Access Innovations, Inc.

    Marjorie M.K. Hlava is Chief Science Officer, Founder, and Chairman Access Innovations, Inc. She founded the company in 1978. The company provides information management services such as metatagging, thesaurus and taxonomy creation workflow consulting. In short, all services to create and maintain a digital information collection. The company owns the Data Harmony software for content creation, taxonomy management, metadata and entity extraction, automatic summarization, and automatic indexing for portals and data collections. In creating well formed data we can significantly enhance search results.