Metadata is an abstraction, a language with a grammar and vocabularies that necessarily emerges in many varied forms across natural languages, cultures, and intellectual domains. This address will recapitulate some of the metaphors that emerged in the community to bridge these abstractions to the problems of information management in the digital world.
Weibel will also explore some of the social engineering challenges of how a growing global community self-organized and, in the current vernacular, “crowd sourced” what grew into a global standards activity, a research community, and many spin-off activities that underlie much of the organization of digital information on the Internet.
There’s a pretty good chance he’ll tell some stories along the way.
In a massively distributed environment like Internet, service providers play a critical role to make information findable. While data providers make available excellent information, hubs collect their metadata and give visibility worldwide. However, the metadata that is being produced and exposed is not always uniformly rich, and needs to be optimized. In the case of scientific literature in food and agriculture, there are certain singularities that make it even more complex. From one side, grey literature is critical, while journal articles are not necessarily the only scholarly communication channel that counts. Secondly, while in other sciences English is the pivotal language, in the case of the food and agriculture and due to the diversity of languages being used, it is necessary to consider multilingualism and semantic strategies as a way to increase accessibility to scientific literature. Service providers have taken different approaches to resolve all this, through expanding the coverage of types of documents and considering semantic technologies as a key instrument to enrich metadata. This panel session aims to discuss the challenges that service providers are facing to aggregate content from data providers in food and agricultural sciences. The five panelists will share their experiences from different perspectives:
In the first part of this presentation, Osma Suominen will introduce the general idea of automated subject indexing using a controlled vocabulary such as a thesaurus or a classification system; and the open source automated subject indexing tool Annif, which integrates several different machine learning algorithms for text classification. By combining multiple approaches, Annif can be adapted to different settings. The tool can be used with any vocabulary; and, with suitable training data, documents in many different languages may be analysed. Annif is both a command line tool and a microservice-style API service which can be integrated with other systems. We will demonstrate how to use Annif to train a model using metadata from an existing bibliographic database and how it can then provide subject suggestions for new, unseen documents.
In the second part of the presentation, Koraljka Golub will discuss the topic of evaluating automated subject indexing systems. There are many challenges in evaluation, for example the lack of gold standards to compare against, the inherently subjective nature of subject indexing, relatively low inter-indexer consistency in typical settings, and dominating out-of-context, laboratory-like evaluation approaches.
In the third part of the presentation, Annemieke Romein and Sara Veldhoen will present a case study of how they have applied Annif in a Digital Humanities research project to categorize early modern legislative texts using a hierarchical subject vocabulary and a pre-trained set.
For practitioners that would like to learn how to use the Annif tool on their own, there is also a follow-up hands-on tutorial. The hands-on tutorial consists of short prerecorded video presentations, written instructions and practical exercises that explain and introduce various aspects of Annif and its use.
The study is based on the results of the “Wooden Slips Character Dictionary” (簡牘字典系統/ WCD), launched by the Academia Sinica Center for Digital Cultures (ASCDC) as an online system to demonstrate the possibility of integrative application of different ontologies and vocabularies to deal with linked data for DH research. To achieve the aforesaid purpose, the study has developed an “integrative Chinese Wooden Slips Ontology.” The main purpose of the ontological design is to support DH scholarship in the research field of ancient Chinese characters and their interpretation, and also serve as a basic data model for structuring an online retrieval system of Chinese characters across different institutes. The integrative Chinese Wooden Slips Ontology is designed based on the CIDOC-CRM model, which contains four different data models of specific fields to enhance the detailed and accurate description of single wooden slips and the information about each written character. The CRM-based data model is extended to enrich the detailed data on each written Chinese character, including temporal information of work production and annotation for the whole wooden slip or a single character. As a result, the CRM classes are extended as nodes to link with the different types of this integrative Chinese Wooden Slips Ontology. Since the ancient Chinese characters are written on fragile materials and easily become damaged or unrecognizable over time, the interpretation process of these characters has to rely on the support both of images and their metadata retrieval through sematic methods, such as IIIF and Linked Data. To read, recognize and compare writing manners between the same or similar written characters is one of the important methods used to interpret characters accurately. IIIF-based retrieval systems can help scholars to conduct research in a visually comfortable way. While interpreting the precise meaning of a written character within the whole text, obtaining information about the composition or annotation of a Chinese ancient glyph must depend on the LOD-based retrieval approach. ASCDC’s “Chinese characters and character realization ontology” and the “Web Annotation on Cultural heritage ontology” might offer a new approach to analyze this Chinese ancient cultural heritage via semantic methods. To extend and enhance the preliminary research results, images of single characters in the WCD system are further interoperated and retrievable in the union catalog of the “Multi-database Search System for Historical Chinese Characters” based on the IIIF-based API, which is established in cooperation with other international research communities, including the Nara National Research Institute for Cultural Properties, Historiographical Institute of the University of Tokyo, National Institute of Japanese Language, National Institute for Japanese Language and Linguistics, and Institute for Research in Humanities at Kyoto University in Japan. The same Chinese characters from datasets of different institutes can be displayed in this collective interface, which supports the study of ancient Chinese characters. Links:
He focuses on the research and practice of digital processing of agricultural information resources, multi-source heterogeneous big data fusion, data opening and sharing, as well as thesaurus, ontology, authority file, linked data and knowledge graph. He presided and participated in the National Key Technology Support Program "Construction and Demonstration Application of Knowledge Organization System for Foreign Language Science and Technology Literature Information", "Agricultural Scientific Data Sharing Center" project of Ministry Of Science and Technology, the National Natural Science Foundation of China project "The Construction and Translation Research of Agricultural Ontology", Chinese Academy Of Engineering Knowledge Center construction project, the EU's seventh framework project and FAO international cooperation project, etc. He has won 4 awards for scientific and technological achievements, obtained more than 10 computer software copyright registrations, published more than 50 papers, and published 4 books.
railML functions on the principle of referencing existing usable standards instead of developing all aspects from scratch and it can therefore be seen as an application of Dublin Core. Since Dublin Core has its background in library science, it makes this a great example for collaborative open source work and cooperation spanning across ectors, which might otherwise not have much in common.