Papers: AI Part 2

Long title
Papers: AI Part 2
Starts at
Thu, Oct 23, 2025, 14:30 GMT+2
Finishes at
Thu, Oct 23, 2025, 16:30 GMT+2
Venue
Aula Rubió (210)
Moderator
Gema Bueno de la Fuente

Moderator

  • Gema Bueno de la Fuente

    University of Zaragoza

    Gema Bueno de la Fuente has a PhD. in Information Science (2010), a bachelor’s degree in Information Science from the Carlos III University of Madrid (2003) with National Award, and a bachelor’s degree in Library and Information Science from the University of Zaragoza (2001). Currently, she is Hired Lecturer at the Dept. of Documentation Sciences and History of Science at the University of Zaragoza. She has teaching and research experience since 2005. Her interests include Open Science, Digital libraries, Metadata, Knowledge Organization and Linked Open Data.

Presentations

Assessing the Effectiveness of LLMs (Large Language Models) for Extracting Topics and Themes in Survey Responses

Authors: Ying-Hsang Liu, Xin Yang, Junzhi Jia

Artificial intelligence (AI) has the potential to automate metadata tasks, such as identifying key topics and analyzing recurring themes in text. Topic extraction focuses on recognizing dominant subjects, whereas theme extraction examines patterns of meaning within the text. This study evaluated DeepSeek R1 (8b), DeepSeek R1 (14b), and Gemma3 (12b) on extracting topics and themes from 50 qualitative survey comments. Using standard information retrieval methods and metrics, we found that Gemma3 (12b) consistently outperformed the DeepSeek models. Topic detection was handled with reasonable effectiveness (both DeepSeek R1 (8b) and Gemma3 (12b) global F1 0.31). However, theme detection was significantly more challenging, particularly for DeepSeek models (global F1s 0.02, 0.08), with Gemma3 (12b) achieving F1 0.26. Significant document-level variability was also observed. Standard information retrieval (IR) metrics can be applied to assess AI performance in metadata tasks, but achieving accuracy comparable to human experts in abstract thematic analysis remains a significant challenge. Developing AI systems that can better capture the subtleties of abstract meaning needs human oversight since these capabilities are critical for supporting complex analytical tasks.
  • Ying-Hsang Liu

    Chemnitz University of Technology

    Ying-Hsang Liu is a researcher at Chemnitz University of Technology (Germany) in Predictive Analytics. With a Ph.D. in Information Science (Rutgers University, USA), he has held academic positions across five countries. His research focuses on human-centered data science, information retrieval, and AI-based systems, supported by grants from the ARC, ARDC, and Airbus. Dr. Liu has authored 65 peer-reviewed publications and two books, serves on ASIS&T and ALISE committees, and is a Distinguished Member of ASIS&T 2022.

HerStory-NeSyAI: Designing inclusive metadata architectures with hybrid AI for epistemic justice in silenced narratives

Authors: Núria Ferran-Ferrer, Miquel Centelles

A presentation of the HerStory-NeSyAI project, which designs inclusive metadata architectures using hybrid AI to address epistemic justice in silenced narratives. It combines interdisciplinary approaches from Library and Information Science, Digital Humanities, and Feminist Theory. The project aims to bridge historical and technological silences by focusing on gender-sensitive representation and ethical AI development. A hybrid neuro-symbolic AI architecture is developed to mitigate bias and enhance transparency in knowledge infrastructures. The ultimate goal is to transform AI into a vehicle for historical and social accountability.
  • Núria Ferran-Ferrer

    Faculty of Information and Audiovisual Media, Universitat de Barcelona (UB)

    Núria, Associate Professor at FIMA (UB), serves as Delegate for the rector on Equality and Director of the PhD Program in Information and Communication. She leads research projects like HerStory and Women and Wikipedia, funded by Spain’s Ministry of Science and Wikimedia Foundation. Her research primarily explores the integration of a gender perspective in the field of LIS, addressing areas such as knowledge organization systems (KOS), innovative teaching methodologies, and broader research within the discipline. Núria was a visiting at Sheffield University (2009) and Tallin University (2015).
  • Miquel Centelles

    Universitat de Barcelona, Faculty of Information and Audiovisual Media

    Miquel Centelles is a professor at the University of Barcelona, specializing in Knowledge Organization, Metadata, and Semantic Web Technologies. He holds degrees in Philology, Library and Information Science, and a PhD on legal thesauri. He coordinates the Master's in Digital Humanities and does research on information retrieval, digital preservation, and accessibility. His current work explores knowledge graphs to enhance generative AI in the project “HerStory: Connecting women’s history to Neuro-Symbolic AI”, funded by the Spanish Ministry of Science.

Assessing Large Language Models: Architectural Archive Metadata and Transcription

Authors: Hannah Moutran, Devon Murphy, Karina Sanchez, Willem Borkgren, Katie Pierce Meyer and Josh Conrad

Our research explores whether Large Language Models (LLMs) can offer a solution for improving the efficiency of developing detailed, rich metadata for large digitized collections. We tested the ability of seven widely available LLMs to complete four metadata generation tasks for a selection of pages from the Southern Architect and Building News (1882-1932): assigning subject headings; creating short content summaries; extracting named entities; and writing transcriptions. Our cross-departmental team evaluated the quality of the outputs, the cost, and the time efficiency of using LLMs for metadata workflows. To do so, we developed a metadata quality rubric and scoring schematic to ground our results. Analysis suggests that models can perform interpretive metadata tasks well, but lack the accuracy needed for assigning terms from controlled vocabularies. With careful implementation, thorough testing, and creative design of workflows, these models can be applied with precision to significantly enhance metadata for digitized collections.
  • Devon Murphy

    University of Texas at Austin Libraries

    Devon Murphy (they/them) currently works as the Metadata Analyst at the University of Texas at Austin. In this role, Murphy oversees standards for the libraries’ archival and bibliographic materials. They received dual masters degrees in Art History and Information Science at the University of North Carolina at Chapel Hill (2019). Their previous work includes the Best Practices for Queer Metadata and the Metadata Best Practices for Trans and Gender Diverse Resources. Murphy also serves as a member of the Visual Resources Association’s (VRA) Equitable Action Committee.

Creativity and Authorship in the Age of Artificial Intelligence: A Metadata Perspective.

Authors: Lala Hajibayova

DCMI conceptualization of creator in the context of AI-generated content introduces new complexities, as AI systems challenge traditional notions of authorship, responsibility, and intellectual agency. Large language models, in particular, can generate content autonomously, yet they lack the agency, intent, and legal or moral responsibility that are traditionally associated with a creator or author. Consequently, assigning a 'creator' under this definition complicates questions of accountability, ownership, and provenance within metadata standards. This paper calls for a critical reexamination of how the concept of 'creator' is defined and applied in metadata schemas in the age of generative AI.
  • Lala Hajibayova

    Kent State University

    Lala Hajibayova is an associate professor in the Kent State University School of Information. She received her Ph.D. in Information Science from Indiana University Bloomington. Hajibayova’s research examines interplay between individuals’ contextualized experiences, patterns and behaviors of engaging with systems and the potential of individuals’ collective actions to enrich systems of representation, organization and discovery.

Enhancing Discovery with AI: Volume Extraction and Summary Statements for Holdings Metadata

Authors: Myung-Ja K. Han, Owen Monroe

Serials volume information is essential for helping users and collection managers understand what volumes are available and to inform future collection strategies. However, due to historical practices of binding and recording summary statements varying by institution, inconsistent holdings metadata poses significant challenges in aggregated discovery environments. This research explores the use of Large Language Models (LLMs) to enhance holdings metadata through two approaches. The first approach employs a Python script that prompts Gemini AI to extract volume (year) information from title pages in digitized serial PDF files submitted by various institutions. The extracted data is used to generate accurate coverage ranges and identify missing volumes for entire digitized serial contents. The second approach trains a BERT model using labeled data from text files to detect title pages of annual reports and identify publication years present or missing from the digitized serial contents. Both approaches—using Gemini and BERT—have shown measurable success in extracting publication date information and generating summary notes that enhance holdings metadata that would support improved resource navigation and informs strategic collection decisions for digitized serials.
  • Myung-Ja (MJ) K. Han

    University of Illinois Urbana-Champaign

    Myung-Ja (MJ) K. Han is a Professor and Metadata Librarian at the University of Illinois Urbana-Champaign. Her research interests include metadata interoperability, information management, and the application of information technologies in libraries. She has served as Co-PI on research projects exploring the benefits for users of linked open data for digitized special collections and Emblematica Online. She is also the co-author of two textbooks on XML. MJ currently serves as Chair of the Program for Cooperative Cataloging (PCC).
  • Owen Monroe

    University of Illinois Urbana-Champaign

    Owen Monroe is a doctoral student at the University of Illinois Urbana-Champaign School of Information Sciences. He studies digital humanities and is interested in using digital methods to study historical literary and information systems. He is also interested in how digital libraries, coding techniques like text mining, and Large Language Models can facilitate research and present data-rich archival materials. He aims to research how digital humanities ideas and skills can be taught and how digital projects can be sustainable and accessible.