Panel 3: The Use of Metadata by Artificial Intelligence

Starts at: Tue, Nov 7, 2023, 14:30 South Korea Time; ( 07 Nov 23 05:30 UTC )
Finishes at: Tue, Nov 7, 2023, 16:00 South Korea Time; ( 07 Nov 23 07:00 UTC )
Venue: Room 201
Moderator: Marie-Claude Côté

The recent rise of ChatGPT and the likes has renewed the public interest in artificial intelligence (AI) applications and machine learning (ML) models. The use of these tools to automate metadata generation is now business as usual for many metadata practitioners and researchers. However, do such tools make use of metadata to be trained and continuously learn? Do the humans behind them use metadata at all to develop them?

When asked if it uses metadata, ChatGPT recognizes that “the training process for models like [it] often involves the use of metadata to help curate and organize the training data” but “while metadata can be helpful in shaping the training process, it does not directly affect the content of [its] responses during interactions”.

Is ChatGPT right? Do you wish to know more about the matter? Come join our AI and ML experts! They will introduce you to:

AI, ML, and the different usages of metadata;
Metadata stores;
The role of metadata in training ML models for fairness; and
The training of an existing tool.

Moderator

Marie-Claude Côté

Library and Archives Canada
ORCID

Twitter

LinkedIn
Marie-Claude Côté is the senior project manager on a multi-million, multi-year project aiming at transforming archival information processes and systems at Library and Archives Canada. Her previous functions in the Government of Canada include leading metadata standards and recordkeeping policy development, implementation, and assessment, as well as contributing to department- and government-wide electronic documents and records management systems implementation. She previously held management and analyst positions at the Treasury Board of Canada Secretariat, the Department of Canadian Heritage, the Canadian Internal Development Agency, and Industry Canada. After obtaining her Master’s Degree in Library and Information Science (MLIS), she worked in public and private sectors libraries before joining the federal public service. Marie-Claude's impact on the advancement of the information management (IM) domain was confirmed by the Ted-Ferrier Award. Marie-Claude also teaches the IM Curriculum at the Canada School of Public Service. She is a certified project management professional (PMP).

Presentations

How does metadata contribute to the reinforcement of human stereotypes in AI systems?

"Bias" is a technical concept in machine learning, referring to situations where the training data used is not representative of the real world, leading to systematically skewed patterns or models. In contrast, "fairness" is a social concept that holds significant implications for users. According to Hannes Hapke et al., fairness is defined as the ability to identify when certain groups of people experience problems or differences compared to others.
To illustrate this, consider a scenario where fairness becomes an issue in predicting credit extension for loans. If an AI model aims to determine who should be granted credit, fairness demands that those who don't pay back loans should have a different experience, i.e., their credit should not be extended. However, a problem arises if the AI model incorrectly denies loans to only individuals of a certain race.
The digitalization of cultural heritage (CH) objects initially began for preservation purposes but has since enabled the application of AI technology to extract knowledge, enhancing user experiences and becoming a valuable resource for GLAM (Galleries, Libraries, Archives, and Museums). When generating digitalized data for AI use, it needs to be annotated meaningfully and relevantly for future ML tasks. Iconclass serves as a great example of such annotations. These annotations form the foundation on which AI models are built, making any embedded ideas and concepts in these structures apparent in the final AI model, thereby potentially reflecting the original biases present.
The presentation will shed light on how metadata can unintentionally contain prejudices against the LGBT community, perpetuate gender inequality, or reinforce colonization stereotypes. We will demonstrate how these biases can permeate AI systems and underscore the importance of fairness in AI in general and for GLAM especially.

Artem Reshetnikov

Barcelona Supercomputing Center
ORCID

LinkedIn
Artem is an accomplished deep learning researcher at the Barcelona Supercomputing Center. With extensive experience in Computer Vision and Natural Language Processing, he skillfully applies these areas of expertise to his work. Throughout his life, Artem has nurtured a profound curiosity for history and art, and he has even completed various online courses in these subjects. For quite some time, he pondered how to unite his two primary passions: machine learning and art. Eventually, he found the perfect solution through his current project, Saint George on a Bike. This innovative endeavor aims to enrich the metadata of paintings using Deep Learning and NLP approaches, effectively bridging the gap between his interests.

Artem's academic journey culminated in a Master's Degree in Engineering from the Autonomous University of Barcelona in 2019. Prior to this, he contributed his talents to several commercial projects that focused on Data Analysis, Computer Vision, and Anomaly Detection in marketing and retail sectors. Notably, he made significant contributions to companies like Indra and Tecnocom in Spain. These projects centered around harnessing the power of deep learning for tasks such as traffic counting through Computer Vision, analyzing time series data to detect anomalies in client behavior, and strategizing marketing efforts based on valuable insights.

Harnessing Metadata for AI Systems

Metadata stands as a foundational element shaping the trajectory of AI systems. This presentation focuses on the pivotal role of metadata in the training phase of AI, elucidating how metadata is essential for optimal model development and performance. This presentation addresses how metadata influences data preparation and processing to aid in the development of accurate, robust, and transparent AI systems.

Craig Eby

Cogniva Information Solutions
ORCID
My background is in Cognitive Science with a key interest in Cognitive Architectures. My current focus is on the use of AI to enable the governance of information within large organizations with a specific interest in how to combine business context with the use of information and data. In this pursuit, I helped build the company Cogniva Information Solutions to provide both AI products and consulting services to help automate Information Governance policies and practices.

Introduction to Artificial Intelligence and Machine Learning

This presentation will provide an overview of Artificial Intelligence and Machine Learning, as applied to problems related to metadata- and information-centric systems. This will span fundamental philosophical underpinnings and the most popular practical machine learning algorithms. Modern neural approaches to natural language processing will be emphasized, given the recent popularity of such techniques. Finally, the presentation will conclude with critical reflection on the state of the art, given concerns about algorithmic bias and the domain-specific suitability and applicability of language models.

Hussein Suleman

University of Cape Town
ORCID

WebPage

Twitter

LinkedIn
Hussein Suleman is Professor and Head of the Department of Computer Science at the University of Cape Town. His main research interests are in digital libraries, ICT4D, African language Information Retrieval, cultural heritage preservation, Internet technology and educational technology. He has in the past worked extensively on architecture, scalability and interoperability issues related to digital library systems. He has worked closely with international and national partnerships for metadata archiving, including: the Open Archives Initiative; Networked Digital Library of Theses and Dissertations; and the NRF-CHELSA South African National ETD Project. His recent research has a growing emphasis on the relationship between low resource environments and digital library architectures. This has evolved into a focus on societal development and its alignment with digital libraries and information retrieval. He is currently collaborating with various colleagues in digital humanities groups to develop a proof-of-concept and experimental low-resource software toolkit for digital repositories; this reconceptualision of the architecture of digital repositories will arguably lower the bar for adoption and reduce the risk of data loss for archivists in low-resource environments.