Panel: Why AI ≠ Automated Indexing: What Is and Is Not Possible
Hans Brandhorst (Leiden, 14-07-1956), is an independent art historian, editor of the Iconclass system and Arkyves. Together with Etienne Posthumus, he has created the online Iconclass browser and the Arkyves website. He has published on illuminated manuscripts, emblems and devices, iconography and classification, and digital humanities. He was trained in art history at Leiden University and has been using Iconclass as an iconographer since the 1980s. He was part of the team at Utrecht University that created the computer version of the system in the 1990s, and he has been acting as editor of the online Iconclass system since 2000. His primary research focus is the simple question “What am I looking at?" in an iconographical sense. His theoretical work deals with the issues of how humanities scholars, in particular iconographers, can collaborate and enrich each other's research results rather than repeat and duplicate efforts. He believes that to accomplish this, the use of a shared vocabulary for the description of the content of cultural artefacts - Iconclass - is an important condition. Besides the editorship of Iconclass and Arkyves, Hans Brandhorst is also involved in the digitization of Kirschbaum’s Lexikon der Christlichen Ikonographie for Brill Publishers, and he is on the Editorial Advisory Board of the Journal Visual Resources. Recently he founded, together with André van de Waal, the “Henri van de Waal Foundation”, dedicated to iconographic research with the help of modern technology such as Artificial Intelligence and Machine Learning.
Barcelona Supercomputing Center
Dr. Joaquim Moré López is a senior researcher and expert in Computational Linguistics. He has a Ph.D. in Knowledge and Information Society at the Open University of Catalonia. His major areas of expertise are Machine Translation, Information Extraction, Text Mining, Natural Language Processing, Knowledge Engineering and Opinion Mining. He is working actively in the BSC for the Oficina Técnica de Gestión for the Plan Nacional de Impulso a las Tecnologías del Lenguaje, sponsored by the Spanish Ministerio de Asuntos Económicos y Transformación Digital, to use HPC to exploit the possibilities of Natural Language Processing for Public and Private institutions. He provides solutions to issues related to natural language processing in the Saint George on a Bike project.
She worked for five years at NASA, logging up to 20 hours per week as an online searcher, using systems and giving feedback. Margie was the Information Director for the DOE National Energy Information Center and its affiliate NEICA, where she rose to the position of Information Director before taking her team private as Access Innovations. Margie developed the Data Harmony software suite to increase search accuracy and consistency while streamlining the clerical aspects in editorial and indexing tasks. Her most recent innovation is applying those systems to medical records for medical claims compliance in a new application called Access Integrity. Margie served for seven years on the NISO board, chaired the SLA Standards committee for nine years, and chaired the NFAIS Standards committee from 2001- 2016. She was instrumental in working on the NISO standards for thesauri and controlled vocabularies (Z39.19), Dublin Core (Z39.85), DOI (Z39.84), and contributed Metadata which formed the basis of CrossRef, The Credit taxonomy for author contributions and others. She was NFAIS president and has served on that board twice, was president of the American Society for Information Science and Technology (ASIS&T), president of Documentation Abstracts, president of ASIDIC, and Treasurer of IIA at the time of its merger with SPA to become SIIA. She also gives back to her local community, serving on the boards of the New Mexico Information Commons, the Hubbell House Alliance, New Mexico Data Stream, and the Hubbell Society Museum and Library. Margie’s work has been acknowledged through numerous awards, including ASIS&T’s Watson Davis award, the SLA John Cotton Dana and SLA President’s Award, recognition as an SLA Fellow, and as an Albuquerque Business First Woman of Influence for Technology. In February of 2014 she received the Miles Conrad lectureship for NFAIS. In November 2014, she received the ASIS&T Award of Merit. She was elected to the Hubbell Hall of Fame in June 2019. She is the author of multiple books and more than 200 articles, including the The Taxobook, a three-volume collection on the history and implementation of taxonomies. She holds two U.S. patents encompassing 21 patent claims.
Barcelona Supercomputing Center
Australian Research Data Commons (ARDC)
Dr. Mingfang Wu, is senior research data specialist at the Australian Research Data Commons (ARDC). She has conducted research in the areas of interactive information retrieval, search log analysis, interfaces supporting exploratory search and enterprise search. Her recent research focuses on the data discovery paradigms as part of the Research Data Alliance (RDA) initiative and for improving data discovery service of Australian national research data catalogue, as well as a few data management related topics such as data provenance, data versioning and data quality.
National Library of Finland
Osma Suominen is working as an information systems specialist at the National Library of Finland. He is currently working on automated subject indexing, in particular the Annif tool and the Finto AI service, as well as the publishing of bibliographic data as Linked Data. He is also one of the creators of the Finto.fi thesaurus and ontology service and is leading development of the Skosmos vocabulary browser used in Finto. Osma Suominen earned his doctoral degree at Aalto University while doing research on semantic portals and quality of controlled vocabularies within the FinnONTO series of projects.
Automated indexing is only as good as the training set, or rules that are available for the domain. It’s important to learn what type of content a pre-trained algorithm has been trained on. Consider what type of content is readily available to train an algorithm—what’s popular and what’s available. Scholarly and historical content is not available in consumable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine where large well documented collections are available. This panel will discuss the current state of automated categorization covering domains including research data, art history, and scientific publishing. The goal is to provide practical advice on how to take meaningful steps towards building the infrastructure needed for sustainable automated indexing.