innovation in metadata design, implementation & best practices

Keynotes, Papers, Presentations, Sessions, Posters, Workshops, Tutorials and Hands-on sessions

Keynotes
Leveraging Standards to Turn Data to Capabilities in Agriculture [Keynote]
Medha Devare

CGIAR is a global research partnership of 15 Centers primarily located in developing countries, working in the agricultural research for development sector. Research at these Centers is focused on poverty reduction, enhancing food and nutrition security, and improving natural resource management to address key development challenges. It is conducted in close collaboration with local partner entities, including national and regional research institutes, civil society organizations, academia, development organizations, and the private sector. Thus, the CGIAR system is charged with tackling challenges at a variety of scales from the local to the global; however, research outputs are however often not easily discoverable and research data often resides on individual laptops, not being well annotated or stored to be accessible and usable by the wider scientific community.

Innovating in this space and enhancing research impact increasingly depends upon enabling the discovery of, unrestricted access to, and effective reuse of the publications and data generated as primary research outputs by Center scientists. Accelerating innovation and impact to effectively address global agricultural challenges also requires that data be easily aggregated and integrated, which in turn necessitates interoperability. In this context, “open” is inadequate, and the concept of FAIR (Findable, Accessible, Interoperable, Reusable) has proven more useful. CGIAR Centers have made strong progress implementing publication and data repositories that meet minimum interoperability standards; however, work is still needed to enable consistent and seamless information discovery, integration, and interoperability across outputs. For datasets, this generally means annotation using standards such as controlled vocabularies and ontologies.

The Centers are therefore working to create an enabling environment to enhance access to research outputs, propelled by funder requirements and a system-wide Open Access and Data Management Policy implemented in 2013 (CGIAR, 2013). Guidance and the impetus for operationalization is being provided via the CGIAR Big Data Platform for Agriculture, and its Global Agricultural Research Data Innovation and Acceleration Network (GARDIAN). GARDIAN is intended to provide seamless, semantically-linked access to CGIAR publications and data, to demonstrate the full value of CGIAR research, enable new analyses and discovery, and enhance impact.

There are several areas in which standards and harmonized approaches are being leverages to achieve FAIRness at CGIAR, some of which are outlined below:

Data sourcing, handling. Research at CGIAR Centers focuses on different commodities, agro-ecologies, disciplinary domains, geographies and scales, resulting in varied data streams—some born digital, often characterized by large size and speed of generation, and frequent updates. Data ranges from agronomic trial data collected by field technicians in a variety of ways and formats, through input and output market information and socioeconomic data on technology adoption and enabling drivers, to weather data and high-throughput sequencing and phenotypic information and satellite images. These datasets cannot all be treated in the same manner; the curation and quality control needs differ significantly, for instance—necessitating somewhat customized approaches depending on the data type. Yet, to address key challenges, they must be discoverable, downloadable, reusable, and able to be aggregated where relevant. As a first step towards these goals, Centers have agreed on and mapped repository schemas to a common Dublin Core based set of required metadata elements (the CG Core Metadata Schema v.1.0).

Enhancing interoperability. Interoperability is critical to providing meaning and context to CGIAR’s varied data streams and enabling integration between linked content types (e.g., related data and publications) and across related data types (e.g. an agronomic data set and related socioeconomic data). CGIAR’s approach to interoperability and data harmonization focuses on the use of standard vocabularies (AGROVOC/GACS), and strong reliance on ontologies developed across CGIAR (efforts such as the Crop Ontology, the Agronomy Ontology – AgrO, the in-development socioeconomic ontology – SociO), and other entities (ENVO, UO, PO etc.)

Discovery framework. Recognizing the need to democratize agricultural research information and make it accessible to partners – particularly those in developing countries – CGIAR’s aspirations focus on enabling data discovery, integration, and analysis via an online, semantically-enables infrastructure. This tool, built under the auspices of the Big Data platform, harvests metadata from CGIAR Center repositories, and includes the ability to relatively seamlessly leverage it with existing and new analytical and mapping tools. While there is no blueprint for building such an ecosystem in the agriculture domain, there are successful models to learn and draw from. Of particular interest are the functionalities demonstrated by the biomedical community via the National Center for Biotechnology Information (NCBI) suite of databases and tools, with attendant innovations for translational medicine and human health. CGIAR efforts to enable similar functionalities to NCBI’s are underlain by strong and enduring stakeholder engagement and capacity building.

Harmonizing data privacy and security approaches as appropriate. Concern regarding data privacy and security is becoming increasingly significant with recent breaches of individual privacy and the GDPR. Any CGIAR repositories and harvesters of data need to provide assurance of data anonymity with respect to personally identifiable information, yet this presents a conundrum when spatial information is so integral to the ability to provide locally actionable options to farming communities. Related to these issues is the concern around ethics, particularly with respect to surveys. The Big Data Platform is therefore focusing on facilitating the creation of and continued support for Institutional Review Boards (IRBs) or their equivalent at Centers, including via guidelines on ethical data collection and handling. Lastly, whether agricultural data is closed or open, it needs to be securely held in the face of such threats as hacking and unanticipated loss.

It is important to recognize that without incentives and a culture that encourages and rewards best practices in managing research outputs, technical attempts to promote the use of standards and enable FAIR resources will meet with limited success, at best. Among some factors influencing these goals: Clarity on incentives (e.g., from funding agency incentives to data contributors understanding the benefits of sharing data) and easy processes, workflows and tools to make data FAIR, with continued support for stakeholders. Researchers need to be accountable for making their outputs FAIR (e.g., through contractual obligation, annual performance evaluation and recognition, funder policies etc.) Only through a multi-faceted approach that recognizes and addresses systemic and individual constraints in both the cultural and technical domain will CGIAR succeed in leveraging its research outputs to fuel innovation and impact, and transform agricultural research for development.

Open Science in a Connected Society [Keynote]
Natalia Manola

Open science comes on the heels of the fourth paradigm of science, which is based on data-intensive scientific discovery, and represents a new paradigm shift, affecting the entire research lifecycle and all aspects of science execution, collaboration, communication, innovation. From supporting and using (big) data infrastructures for data archiving and analysis, to continuously sharing with peers all types of research results at any stage of the research endeavor and to communicating them to the broad public or commercial audiences, openness moves science away from being a concern exclusively of researchers and research performing organisations and brings it to center stage of our connected society, requiring the engagement of a much wider range of stakeholders: digital and research infrastructures, policy decision makers, funders, industry, and the public itself.

Although the new paradigm of science is shifting towards openness, participation, transparency, and social impact, it is still unclear how to measure and assess these qualities. This presentation focuses on the way the scientific endeavor is assessed and how one may shape up science policies to address societal challenges, as science is becoming an integral part of the wider socio-economic environment. It discusses how one may measure the impact science has on innovation, the economy, and society in general, and how the need for such measurement influences the collection, stewardship, preservation, access, and analysis of digital assets. It argues that an open transfer of both codified and tacit knowledge lies at the core of impact creation and calls for a consistently holistic systematic approach to research. In particular, it includes codified knowledge in the form of traditional publications and datasets, but also formal intellectual property (patents, copyright, etc.) and ‘soft’ intellectual property (e.g., open software, databases or research methodologies), as well as tacit knowledge in the form of skills, expertise, techniques, and complex cumulative knowledge, conceptual models, and terminology.

Putting the spotlight on (open) data collection and analysis, this presentation further illustrates a use case based on the collaboration between OpenAIRE (www.openaire.eu) and the Data4Impact project (www.data4impact.eu) on the use of an open scholarly communication graph, combined with text mining, topic modeling, machine learning, and citation based approaches to trace and classify the societal impact of research funded by the European Commission.

A Web-Centric Pipeline for Archiving Scholarly Artifacts [Keynote]
Martin Klein / Herbert Van de Sompel

Herbert Van de Sompel, announced as the keynote speaker for Wednesday, had a health issue late in August and will not be able to travel to TPDL. His keynote concerns work done by his team, and Martin Klein, working with Herbert, will be delivering the talk.

Scholars are increasingly using a wide variety of online portals to conduct aspects of their research and to convey research results. These portals exist outside of the established scholarly publishing system and can be dedicated to scholarly use, such as myexperiment.org, or general purpose, such as GitHub and SlideShare. The combination of productivity features and global exposure offered by these portals is attractive to researchers and they happily deposit scholarly artifacts there. Most often, institutions are not even aware of the existence of these artifacts created by their researchers. More importantly, no infrastructure exists to systematically and comprehensively archive them, and the platforms that host them rarely provide archival guarantees; many times quite the opposite.

Initiatives such as LOCKSS and Portico offer approaches to automatically archive the output of the established scholarly publishing system. Platforms like Figshare and Zenodo allow scholars to upload scholarly artifacts created elsewhere. They are appealing from an open science perspective and researchers like the citable DOIs that are provided for contributions. But these platforms don’t offer a comprehensive archive for scholarly artifacts since not all scholars use them, and the ones that do are selective regarding their contributions.

The Scholarly Orphans project funded by the Andrew W. Mellon Foundation, explores how these scholarly artifacts could automatically be archived. Because of the scale of the problem – the number of platforms and artifacts involved – the project starts from a web-centric resource capture paradigm inspired by current web archiving practice. Because the artifacts are often created by researchers affiliated with an institution, the project focuses on tools for institutions to discover, capture, and archive these artifacts. The Scholarly Orphans team has started devising a prototype of an automatic pipeline that covers all three functions. Trackers monitor the APIs of productivity portals for new contributions by an institution’s researchers. The Memento Tracer framework generates web captures of these contributions. Its novel capturing approach allows generating high-quality captures at scale. The captures are subsequently submitted to a – potentially cross-institutional – web archive that leverages IPFS technology and supports the Memento “Time Travel for the Web” protocol. All components communicate using Linked Data Notifications carrying ActivityStreams2 payloads.

Without adequate infrastructure, scholarly artifacts will vanish from the web in much the same way regular web resources do. The Scholarly Orphans project team hopes that its work will help raise awareness regarding the problem and contribute to finding a sustainable and scalable solution for systematically archiving web-based scholarly artifacts. This talk will be the first public communication about the team’s experimental pipeline for archiving scholarly artifacts.

Session: RDF
Linking knowledge organization systems via Wikidata [Presentation]
Joachim Neubert
Wikidata is a large collaboratively curated knowledge base, which connects all of the roughly 300 Wikipedia projects in different languages and provides common data for them. Its items also link to more than 1500 different sources of authority information. Wikidata can therefore serve as a linking hub for the authorities and knowledge organization systems represented by these “external identifiers”. In the past, this approach has been applied successfully to rather straight-forward cases such as personal name authorities. Knowledge organization systems with more abstract concepts are more challenging due to, e.g., partial overlaps in meaning and different granularities of concepts.
An Approach to Enabling RDF Data in Querying to Invoke REST API for Complex Calculating [Paper]
Xianming Zhang
RDF is short in calculating, especially complex calculating. SPARQL Inferencing Notation (SPIN) has been proposed with a specific capability of returning a value by executing external JavaScript file that in partly performs complex calculating, however it is still far away from accomplishing many practices. This paper investigates SPIN's capability of executing JavaScript, namely SPINx framework, presents a method of equipping RDF data with a new capability of invoking REST API, by which a user who is querying can obtain returned value by invoking the REST API  performing complex calculating ,and then the value is semantically annotated for further use .Calculation of lift coefficient of airfoil is taken as a use case ,in which with a given attack angle as input a desired returned value is obtained by invoking a particular REST API while querying the RDF data. Through this use case, it is explicit that RDF data invoking REST API for complex calculating is feasible and profound in both real practice and semantic web.
Experiments in Operationalizing Metadata Quality Interfaces: A Case Study at the University of North Texas Libraries [Paper]
Mark Edward Phillips, Hannah Tarver
This case study presents work underway at the University of North Texas (UNT) Libraries to design and implement interfaces and tools for analyzing metadata quality in their local metadata editing environment.  It discusses the rationale for including these kinds of tools in locally-developed systems and discusses several interfaces currently being used at UNT to improve the quality of metadata managed within the Digital Collections.
Session: Multilingual
A study of multilingual semantic data integration [Presentation]
Douglas Tudhope, Ceri Binding
The availability of the various forms of open data today offers great opportunity for meta level research that draws on combinations of data previously considered only in isolation. There are also great challenges to be overcome; datasets may have different schemas, may employ different terminology or languages, data may only be represented by textual reports. Metadata and vocabularies of different kinds have the potential to help address many of these issues. Previous work explored semantic integration of English language archaeological datasets and reports (Binding et al., 2015; Tudhope et al., 2011). This presentation reflects on initial experience from a semantic integration exercise involving archaeological datasets and reports in different languages. Different forms of Knowledge Organization Systems (KOS) were key to the exercise. The Getty Art and Architecture Thesaurus (AAT) was used as the underlying value vocabulary and the CIDOC CRM ontology as the metadata element set (Isaac et al. 2011) for the semantic integration. Linked data expressions of the vocabularies formed part of an integration dataset (RDF) extracted from the source data, together with subject metadata automatically generated from the reports via Natural Language Processing (NLP) techniques. The data was selected following a broad theme of wooden material, objects and samples dated via dendrochronological analysis. The investigation was conducted as an advanced data integration case study for the ARIADNE FP7 archaeological infrastructure project (ARIADNE 2017), with the datasets and reports provided by Dutch, English and Swedish ARIADNE project partners.
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage – Some Challenges and Solutions [Paper]
Valentine Charles, Hugo Manguinhas, Antoine Isaac, Nuno Freire and Sergiu Gordea
Europeana gives access to data from Galleries, Libraries, Archives & Museums across Europe. Semantic and multilingual diversity as well as the variable quality of our metadata makes it difficult to create a digital library offering end-user services such as multilingual search. To palliate this, we build an “Entity Collection”, a knowledge graph that holds data about entities (places, people, concepts and organizations) bringing context to the cultural heritage objects.The diversity and heterogeneity of our metadata has encouraged us to re-use and combine third-party data instead of relying only on those contributed by our own providers. This raises however a number of design issues. This paper lists the most important of these and describes our choices for tackling them using Linked Data and Semantic Web approaches.
Session: Metadata
National Diet Library Data for Open Knowledge and Community Empowerment [Presentation]
Saho Yasumatsu, Tomoko Okuda
The National Diet Library (NDL) has been promoting utilizations of the data created and provided on the Internet by the NDL since it established its "Policy of providing databases created by the National Diet Library." The NDL provides bulk download of open datasets and takes part in public events related to open data and civic technology, which increased visibility of NDL data in communities throughout Japan. The NDL also organizes ideathons and hackathons to promote its data and services. These outreach activities resulted in any number of interesting and potentially useful initiatives.This presentation will demonstrate the NDL's efforts and achievements in promoting the use of its data, while showcasing some of the best civic-driven applications and visualizations of library data.
Metadata as Content: Navigating the Intersection of Repositories, Documentation, and Legacy Futures [Presentation]
Erik Radio
Documentary Relations of the Southwest (DRSW) is a dataset of bibliographic metadata derived from over 1500 reels of microfilmed documents that trace the history of the southwest from the 16th century until Mexico's independence in 1821. Originally made available to scholars' through a now defunct proprietary repository, DRSW is currently completing a migration from a home-grown solution to Blacklight as a sustainable option. While migrating content is a familiar scenario, this migration highlights key challenges in navigating the intersection of legacy design and possible futures for metadata curation and repository selection. This presentation deals with challenges revolving around three paradigms: metadata as content, system documentation generation, and metadata futures for indexing and integration.
Wikidata & Scholia for scholarly profiles: the IU Lilly Family School of Philanthropy pilot project [Presentation]
Mairelys Lemus-Rojas, Jere Odell
During recent years, cultural heritage institutions have become increasingly interested in participating in open knowledge projects. The most commonly known of these projects is Wikipedia, the online encyclopedia. Libraries and archives in particular, are also showing an interest in contributing their data to Wikidata, the newest project of the Wikimedia Foundation. Wikidata, a sister project to Wikipedia, is a free knowledge base where structured, linked data is stored. It aims to be the data hub for all Wikimedia projects. The Wiki community has developed numerous tools and web-based applications to facilitate the contribution of content to Wikidata and to display the data in more meaningful ways. One such web-based application is Scholia which was created to provide users with complete scholarly profiles by making live queries to Wikidata and displaying the information in an appealing and effective manner. Scholia provides a comprehensive sketch of the author’s scholarship. This presentation will demonstrate our efforts to contribute data to Wikidata related to our faculty members and will provide a demo of Scholia’s functionalities. At IUPUI (Indiana University-Purdue University Indianapolis) University Library, we conducted a pilot project where we selected 18 faculty members from the IU Lilly Family School of Philanthropy to be included in Wikidata. The School of Philanthropy, located on the IUPUI campus, is the leading school in the subject in the United States. The scholarship produced by its faculty is known to be widely used. We wanted to provide a presence in Wikidata not just for the faculty, but also for their publications and co-authors. For the creation of Wikidata items, we used a combination of semi-automated and manual processes. Once the items were created in Wikidata, we used Scholia to generate the scholarly profiles. Academic libraries have the capacity to create and curate data about scholars affiliated with their institutions. We expect that the data set we built in Wikidata will help our institution better understand and describe the value of this school to global research on philanthropic giving and nonprofit management. Our pilot project is just a first step toward more efficient and systematic library-based contributions to Wikidata.
Session: Categorisation
Why Build Custom Categorizers Using Boolean Queries Instead of Machine Learning? Robert Wood Johnson Foundation Case Study [Presentation]
Joseph Busch, Vivian Bliss
This presentation will cover a case study for using Boolean queries to scope custom categories, provide a Boolean query syntax primer, and then present a step-by-step process for building a Boolean query categorizer. The Robert Wood Johnson Foundation (RWJF) is the largest philanthropy dedicated solely to health in the United States. Taxonomy Strategies has been working with RWJF to develop an enterprise metadata framework and taxonomy to support needs across areas including program management, research and evaluation, communications, finance, etc. We have also been working with RWJF on methods to apply automation to support taxonomy development and implementation within their various information management applications. Machine learning has become a popular and hyped method promoted by large information management application vendors including Microsoft, IBM, Salesforce and others. The problem is that machine learning is opaque. The benefit is that you don’t need to do any preparation, content just gets processed. The problem is that the categories are generic, may be irrelevant, can be biased, and are difficult to change or tune. Pre-defined categories (e.g., a controlled vocabulary or taxonomy) plus Boolean queries to scope the context for categories are much more transparent. The benefit is relevant categories. The problem is that pre-defined categories requires work to set up, and specialized skills. But how hard is it do this?
Categorization Ethics: Questions about Lying, Moral Truth, Privacy and Big Data [Presentation]
Joseph Busch
Categorization is a common human behavior and it has many social implications. While categorization helps us make sense of the world around us, it also affects how we perceive the world, what we like and dislike, who we feel comfortable with and who we fear. Categorization is affected by our family, culture and education. But we can take responsibility for our own perceptions, misperceptions can be pointed out and sometimes changed. But what about categorization imposed outside of us that affects us. Should that be allowed? How is that determined? How can it be changed? These are difficult issues. For information aggregators and information analyzers, the guidelines for appropriate behavior are not always clear, nor is the responsibility for outcomes as a result of errors, bias and worse … When errors and bias are commonly held, this can be reflected in the information ecology. The tipping point need not be a majority, truth or based on ethics. It’s easy enough to identify cases of mis-categorization, but when do you do something about it? What can you do about it?
Session: Validation
Metadata quality: Generating SHACL rules from UML class diagram [Presentation]
Emidio Stani
Metadata plays a fundamental role beyond classified data, as data needs to be transformed, integrated, and transmitted. Like data, metadata needs to be harvested, standardized and validated. Metadata management processes require resources. The challenge for organizations is to make the processes more efficient, while maintaining and even increasing confidence in their data. While RDF harvesting has already become an important step implemented at large scale (European Data Portal), there is now a need to introduce a RDF validation mechanism. However such a mechanism will depend upon the definition of RDF standards. When a standard is set, the provision of a validation service is necessary to determine if metadata complies, as for example with the HTML validation service.
Validation of a metadata application profile domain model [Paper]
Mariana Curado Malta, Helena Bermúdez-Sabel, Ana Alice Baptista, Elena González-Blanco
The development of Metadata Application Profiles is done in several phases. According to the Me4MAP method, one of this phases is the validation of the domain model. This paper reports the validation process of a complex domain model developed under the project POSTDATA - Poetry Standardization and Linked Open Data. The development of the domain model ran with two steps of construction and two of validation. The validation steps drew on the participation of specialists in European poetry and the use of real resources. On the first validation we used tables with information about resources related properties and for which the experts had to fill certain fields like, for examples, the values. The second validation used a XML framework to control the input of values in the model. The validation process allowed us to find and fix flaws in the domain model that would otherwise have been passed to the Description Set Profile and possibly would only be found after implementing the application profile in a real case.
Session: Application Profiles
Modeling and application profiles in the Art and Rare Materials BIBFRAME Ontology Extension [Presentation]
Jason Kovari, Melanie Wacker, Huda Khan, Steven Folsom
Since April 2016, the Art Libraries Society of North America's Cataloging Advisory Committee (CAC) and the RBMS Bibliographic Standards Committee (BSC) have collaborated with the Andrew W. Mellon Foundation funded Linked Data for Production project on the Art and Rare Materials BIBFRAME Ontology Extension (ARM). BIBFRAME leaves some areas underdefined that need to be expanded by specialized communities. More specifically, ARM facilitates the descriptive needs of the art and rare materials communities in areas such as exhibitions, materials, measurements, physical condition and much more. Between April 2016 and February 2018, work focused on modeling. In February 2018, our focus shifted to development of SHACL application profiles for Art resources and a Rare Monographs, which we are using to define forms and display for the cataloging environment in VitroLib, an RDF-based, ontology agnostic cataloging tool being developed as part of the Linked Data for Libraries - Labs project that was discussed at DCMI 2017. Since these application profiles are being implemented in VitroLib, catalogers will be able to test the ARM modeling in a real-world environment, providing feedback to the project for potential future development. This presentation will provide an overview of select ARM modeling components, detail the process of creating and defining SHACL application profiles for ARM, and discuss challenges and opportunities for implementing these profiles in VitroLib. Further, we will discuss our strategy for low-threshold hosting of the ontology and administrative questions regarding long-term maintenance of this BIBFRAME extension.
Developing a Metadata Application Profile for the Daily Hire Labor [Presentation]
Sangeeta Sen, Nisat Raza, Animesh Dutta, Mariana Curado Malta, Ana Alice Baptista
EMPOWER SSE is a Fundação para a Ciência e Tecnologia (FCT, Portugal) and Department of Science & Technology (DST, India), financed research project that aims to use the Linked Open Data Framework to empower the Social and Solidarity Economy (SSE) Agents. It is a collaborative project between India and Portugal that is focused on defining a Semantic Web framework to consolidate players of the informal sector, enabling a paradigm shift. The Indian economy can be categorized into two sectors: formal and informal. The informal sector economy differs from the formal as it is an unorganized sector and comprised of economic activities that are not covered by formal arrangements such as taxation, labor protections, minimum wage regulations, unemployment benefits, or documentation. The major economy in India depends on the skilled labor of this informal sector such e.g. daily labor, farmers, electricians, food production, and small-scale industries (Kalyani, 2016). The informal sector is mainly made of skilled people that follow their family job traditions, sometimes they are not even formally trained. This sector struggles with the lack of information, data sharing needs and interoperability issues across systems and organizational boundaries. In fact, this sector does not have any visibility to the society not having the possibility to do business as most of the agents of this sector do not reach the end of the chain. This blocks them from getting proper exposure and a better livelihood.
Session: Models
Research data management in the field of Ecology: an overview [Paper]
Cristiana Alves, João Aguiar Castro, João Pradinho Honrado, Angela Lomba
The diversity of research topics and resulting datasets in the field of Ecology has grown in line with developments in research data management. Based on a meta-analysis performed on 93 scientific references, this paper presents a comprehensive overview of the use of metadata models in the ecology domain through time. Overall, 40 metadata models were found to be either referred or used by the biodiversity community from 1997 to 2018. In the same period, 50 different initiatives in ecology and biodiversity were conceptualized and implemented to promote effective data sharing in the community. A relevant concern that stems from this analysis is the need to establish simple methods to promote data interoperability and reuse, so far limited by the production of metadata according to different standards. With this study, we also highlight challenges and perspectives in research data management in the domain of Ecology towards best practice guidelines.
Metadata Models for Organizing Digital Archives on the Web: Metadata-Centric Projects at Tsukuba and Lessons Learned [Paper]
Shigeo Sugimoto, Senan Kiryakos, Chiranthi Wijesundara, Winda Monika, Tetsuya Mihara, Mitsuharu Nagamori
There are many digital collections of cultural and historical resources, referred to as digital archives in this paper. Domains of digital archives are expanding from traditional cultural heritage objects to new areas such as pop-culture and intangible objects. Though it is known that metadata models and authority records, such as subject vocabularies, are essential in building digital archives, they are not yet well established in these new domains. Another crucial issue is semantic linking among resources within a digital archive and across digital archives. Metadata aggregation is an essential aspect for the resource linking. This paper overviews three metadata-centric on-going research projects by the authors and discuss some lessons learned from the projects. The subject domains of these research projects are disaster records of the Great East Japan Earthquake happened in 2011, Japanese pop-culture such as Manga, Anime and Game, and cultural heritage resources in South and Southeast Asia. These domains are poorly covered by conventional digital archives by memory institutions because of the nature of the contents. The main goal of this paper is not to report those projects as completed research, but to discuss issues of metadata models and aggregation which are important in organizing digital archives in the Web-based information environment.
Posters
Linked Data Publishing and Ontology in Korea Libraries [Poster]
Mihwa Lee, Yoonkyung Choi
This posters is to anylze the LOD publishing and reusing the external LODs, and to suggest the future direction for LOD services in Korea. This poster is to analyze the LOD publishing and reusing the external LODs, and to suggest the future direction for LOD service in Korea. For this study, literature reviews and case study are carried on. For case study, KERIS, NLK, and KISTI are selected, which are the major organizations related to the library linked data. They have been publishing the linked open data of bibliographic records and authority data with interlinking the external LOD such as VIAF, LDS, BNB, ISNI, WorldCat, and so on. We analyzed the characteristics of three services – (1) subject domain, (2) volumes of bibliographic, authority, and subject data, (3) bibliographic, name, and subject ontology, (4) local ontology, and (5) interlinking external LOD. As the result for comparing three LOD services in aspect of ontology, FOAF, SKOS, DC, BIBO are common for all, and but MODS, DCTERMS, BIBFRAME, PRISM, and Bibtex are different ontology. Also all services have their own ontology – properties and classes. These local property and class has not consistency and has potential conflict between ontology. In aspect requirements for metadata, interoperability is very important requirement. The reason that locals developed their own ontology is lack of classes and properties for describing data for constructing LOD. Therefore LC BIBFRAME is developed as specific ontology for library sector.
Author Identifier Analysis: Name Authority Control in Two Institutional Repositories [Poster]
Marina Morgan, Naomi Eichenlaub
The aim of this poster is to analyze name authority control in two institutional repositories to determine the extent to which faculty researchers are represented in researcher identifier databases. A purposive sample of 50 faculty authors from Florida Southern College (FSC) and Ryerson University (RU) were compared against five different authority databases: Library of Congress Name Authority File (LCNAF), Scopus, Open Researcher and Contributor ID (ORCID), Virtual International Authority File (VIAF), and International Standard Name Identifier (ISNI). We first analyzed the results locally, then compared them between the two institutions. The findings show that while LCNAF and Scopus results are comparable between the two institutions, the difference in the ORCID, VIAF, and ISNI are considerable. Additionally, the results show that the majority of authors at each institution are represented in two or three external databases. This has implications for enhancing local authority data by linking to external identifier authority data to augment institutional repository metadata.
Visualizing Library Metadata for Discovery [Poster]
Myung-Ja K. Han, Stephanie R. Baker, Peiyuan Zhao, Jiawei Li
Benefits of visualization have been discussed widely and it is actually implemented into library services. However, cases for visualization have been mostly focused on collection analysis to improve collection development policies and budget management, but not for discovery service that facilitates library’s catalog records in its maximum capacity. One of the challenges working with library catalog records for visualization is a sheer number of elements included in the MAchine-Readable Cataloging (MARC) format record, such as control field, data field, subfield, and indicators, used to describe library resources. As is well-known, there are more than 1,900 fields in the MARC, which is just too many to use for visualization (Moen and Benardino, 2003). Instead of showing a clear relationship between resources, it may muddle those relationships since there are so many elements to include in visualization. The question then is whether all information included in the library catalog record should be used for discovery and visualization services, and if not, which should be the essential information to be included.
Building a Framework to Encourage the use of Metadata in Modern Web-Design [Poster]
Jackson Morgan
When Tim Berners-Lee published the roadmap for the semantic web in 1998, it was a promising glimpse into what could be accomplished with a standardized metadata system, but nearly 20 years later, adoption of the semantic web has been less than stellar. In those years, web technology has changed drastically, and techniques for implementing semantic web compliant sites have become relatively inaccessible. This poster outlines a JavaScript framework called Beltline.js which seeks to encourage the use of metadata by making it easy to integrate into modern web best-practices.
Analysis of user-supplied metadata in a health sciences institutional repository [Poster]
Joelen Pastva
Launched in October, 2015 by the Galter Health Sciences Library, the DigitalHub repository is designed to capture and preserve the scholarly outputs of Northwestern Medicine. A major motivation to deposit in the repository is the possibility of improved citations and discovery of resources, however one of the largest barriers hampering discovery is a lack of descriptive metadata. Because DigitalHub was designed for ease of use, very minimal metadata is required in order to successfully deposit a resource. However, many optional descriptive metadata fields are also made available to encourage the consistent and detailed entry of descriptive information. The library was curious to evaluate how users were approaching available metadata fields and accompanying instructions prior to the library's performance of metadata enhancement operations. In order to evaluate user-supplied metadata, an export was made of all of the metadata in DigitalHub for a 2.5 year period. Records previously enhanced by librarians, or records initially deposited by library staff were excluded from consideration. The metadata was then evaluated for completeness, choice of dropdown terms for resource type, inclusion of collaborators, use of controlled vocabulary fields, and any areas that indicated a clear misunderstanding of the intended use of the metadata field. This poster presents the preliminary findings of this analysis of user-supplied metadata. It is hoped that the findings of this analysis will help guide future system and interface design decisions, cleanup activities, and library instruction activities. Ultimately the goal is to make the interface as usable and effective as possible to encourage depositors to supply an optimal amount of descriptive metadata upfront, and to continue using the repository in the future. These results should be of interest to repository managers that rely on users to supply initial descriptive metadata, especially for health sciences disciplines.
Other Presentations and Special Sessions
Metadata 2020: Metadata Connections Across Scholarly Communications [Presentation]
Patricia Feeney, Head of Metadata at Crossref
Metadata 2020 is a nonprofit collaboration that advocates and seeks richer, connected, and reusable, open metadata for all research outputs. The collaboration of over 100 individuals includes representatives from publisher, librarian, service provider, data publisher and repository, researcher, and funder communities. In 2018, Metadata 2020 formed six cross-community, collaborative projects.  These projects include activities to map between schemas; define core element terminology; create principles and share best practice; chart metadata evaluation tools; and significantly, communicate with researchers and organizations about incentives for improving metadata. In this presentation, we will briefly outline each project, and then present in more detail about the ‘Metadata Recommendations and Element Mappings’, and ‘Incentives for Improving Metadata’ projects (which includes the development of a metadata flow diagram) showing work to date, and inviting participation from attendees to help progress the work to more fully represent the librarian community. While the Metadata 2020 collaboration has many highly experienced individuals participating, we believe that it is important to learn from the experience of others who have worked on similar projects in the past, and would be grateful of the input from the DCMI community.
Lightweight rights modeling and linked data publication for online cultural heritage [Special Session]
Antoine Isaac, Mark Matienzo, Michael Steidl
Institutional websites and aggregation initiatives like Europeana and DPLA seek to facilitate access and re-use of vast amounts of digitized cultural material online. Metadata about digitized content has long been identified as a key asset to facilitate these ends, and these initiatives have created metadata frameworks that enhance interoperability across information spaces and systems. Expressing the conditions for re-use that derive from intellectual property rights remains an issue, however. Published (meta)datasets still often indicate copyrights and other access conditions using ad-hoc descriptions that specific to sectors, languages and national contexts. Creative Commons is a great leap forward, as it provides a standardized set of licenses and public domain marks that can be used to label open digital heritage resources in an interoperable way. Its focus on full openness, however, means that it cannot be used for a significant part of cultural collections published online. Recently, W3C has published the Open Digital Rights Language (ODRL) for representing policies that combine permissions and duties. While ODRL enables to express rights-related statements of arbitrary complexity, it does not provide a set of community-backed statements that can be reused out-of-the-box to label cultural resources. Rightsstatements.org is an international initiative that aims at filling these gaps, offering to the cultural heritage domain the resources to label in an interoperable way (using Linked Data technology) digitized objects that are not always in scope for full open publication. In this special session, we will present the challenges that RightsStatements.org has to address to provide a service useful to the digital heritage domain. After a discussion on the context and issues of expressing rights to access and re-use digital cultural material, we will present RightsStatements.org's offer as a complement to initiatives like Creative Commons. We will then dive in the details of implementation and use of the statements and services that RightsStatements.org provides. We will focus first on data modeling, presenting how rights statements are expressed in a lightweight and interoperable way, both for machines and humans, based on Linked Data principles and vocabularies. We will then relate our work with other relevant initiatives in the community, both in terms of (1) standardized and/or shareable sets of statements, including projects such as Wikidata, and (2) frameworks to express statements in a more complex way, such as W3C's ODRL. Finally we will seek to bridge with efforts to express rights and licenses in other domains relevant to the Dublin Core audience, such as in the (ongoing work on the) W3C DCAT vocabulary. For every main agenda item in the session, we have planned "interaction points", not only opening the floor to questions from the audience, but also questioning them on their experience with expressing intellectual property rights and other (non-)legal conditions, asking them feedback on the modeling choices made in Rightsstatements.org, evaluating the labeling of some objects in Europeana, or discussing how the community should further organize itself to tackle rights issues better, if needed.
LOD-KOS: A Framework for Private Enterprise Data as well as Public Open Data [Presentation]
Dave Clarke
Linked Open Data Knowledge Organization Systems (LOD-KOS) are defined by Dr. Zeng and Dr. Mayr in their 2018 paper Knowledge Organization Systems (KOS) in the Semantic Web: a multi-dimensional review as ‘value vocabularies and lightweight ontologies within the Semantic Web framework’. The paper surveys open data examples in the sciences and humanities and describes a community movement to convert and make sharable ‘thesauri, classification schemes, name authorities and lists of codes and terms, produced before the arrival of the ontology-wave’ into the ‘Semantic Web mainstream’. This session will review several examples of open data LOD-KOS, and then contrast them with examples of how commercial enterprises are currently using the Linked Data model to manage commercially sensitive enterprise data. The session will explore the practical challenges faced by any enterprise that has the requirement to manage a mixture of both public open data KOS resources and commercially sensitive KOS resources. The need emerges to support both collaboration and compartmentalization, and to do so flexibly. In order to do this KOS management systems need to support flexible and extensible access level controls (ACLs) and also assign ACL-Metadata to entities, predicates and data values. Both the public open data community and the private enterprise data community stand to benefit from a shared framework for curating KOS, and from mechanisms that will easily allow the selective sharing of some resources while protecting the confidentiality of others.
Metadata for Smart Sustainable Cities [Special Session]
Claudia Sousa Monteiro, Catarina Selada, Vera Nunes, Paula Monteiro, Ana Alice Baptista, João Tremoceiro
In the past years, the Smart City concept has emerged as a way of optimizing the management of resources use, mainly due to increasing urbanization and population growth. There are many definitions of this concept highlighting the central role that new digital technologies play in improving the cities operation. Under this scope, the Urban Analytics evolved as a new research field, looking to data as a mean to understand and study the urban systems by transforming data into information and knowledge. Therefore, there is a significant potential to improve the data collection, integration and processing efforts in what concerns the cities, with the successful use of linked data, depending on its ability to provide an overview of the city data at multiple scales and for different dimensions. In this sense, metadata has a central role in the cities data management, usefulness, and human/machine readability.This session has the goal to foster the collaboration between several city users and bring to the discussion the challenges and opportunities of data availability, access and applications in developing new and innovative solutions to make the cities smarter in enhancing the sustainable change.
The Use Of Persistent Identifiers In Dublin Core Metadata [Special Session]
Paul Walk, Tom Baker
This session will bring together stakeholders and metadata experts to discuss the representation of persistent identifiers (PIDs) in Dublin Core metadata, with a particular focus on the domain of scholarly communications and Open Access. This domain recognises the importance of PIDs in metadata - especially to identify scholarly outputs (using DOIs) and, increasingly, to identify authors (often using ORCIDs). The experiences and recommendations discussed here will almost certainly have wider applicability in many other domains. This session will be a working meeting. It will follow on from a project initiated by DCMI to develop some candidate recommendations. The anticipated outcome of this session will be a formal recommendation from DCMI.
Providing Access to Cultural Objects Curated in Digital Collections – Models and Profiles [Special Session]
Marcia Zeng, Shigeo Sugimoto, Chiranthi Wijesundara, Keven Liu, Cuijian Xia, Maja Žumer
This panel brings together researchers involved in the research and development (R&D) of structured data about information resources with a focus on cultural objects (mainly non-conventional) curated in digital collections. The uniqueness of these collections is that they are not constrained by physical location or the premises of an institution, thus aggregation and re-organization of metadata based on a common model is needed. This uniqueness is also reflected in the cases of digital collections with which the panelists have been involved, including: intangible cultural heritage in developing countries in South/Southeast Asia; Japanese pop-culture (particularly Manga); disaster archive records, genealogy records, ancient Chinese books, and music resources in general. The panelists will share the developments and research findings in three layes: 1, modeling for the domain in question, 2, extension and refinement of conceptual models, and 3, constructing of application profiles and knowledge bases, which are used in real object descriptions, as well as platform construction built on data models. The panelists will discuss challenges, processes, limitations, and strategies.
The times they are a changin' - implementing a modern library and information science curriculum [Presentation]
Magnus Pfeffer
In the past decade, the consensus of what kind of competencies a graduate with a library and information science degree should have has started shifting. With the ongoing digitization of workflows and the creation of new online services, IT competencies have risen in demand.3 years ago, the school of library and information management at Stuttgart Media University started with the process of overhauling their curriculum in response to this change.The presentation will cover the challenges of integrating new IT subjects like programming, data management, database design and web-based services into a library science curriculum. It will discuss which competencies were considerd necessary for modern metadata management tasks and the didactic concepts that were developed to teach these to a heterohenous audience. As the changes are in effect for two years now, the results of an evaluation will also be presented.
Workshops
Domain Specific Extensions for Machine-actionable Data Management Plans [Workshop]
João Cardoso, Tomasz Miksa
The current manifestation of a Data Management Plans (DMP) only contributes to the perception that DMP are an annoying administrative exercise. What they really are—or at least should be—is an integral part of research practice, since today most research across all disciplines involves data, code, and other digital components. There is now widespread recognition that, underneath, the DMP could have more thematic, machine-actionable richness with added value for all stakeholders: researchers, funders, repository managers, ICT providers, librarians, etc. As a result, parts of the DMP can be automatically generated and shared with other collaborators or funders. To achieve this goal we need: (1) good understanding of research data workflows, (2) research data management infrastructure, (3) common data model for machine-actionable DMP. In this workshop we will focus on the common data model for machine-actionable DMP and will seek to identify which domain specific extensions must be implemented to fulfill requirements of stakeholders, such as digital libraries and repositories. We will discuss which information they can provide and which information they can expect, and how existing and future systems and services can support and potentially automate this information flow. [more information]
18th European Networked Knowledge Organization Systems (NKOS) [Workshop]
The proposed joint NKOS workshop at TPDL2018 / DCMI2018 will explore the potential of KOS such as classification systems, taxonomies, thesauri, ontologies, and lexical databases, in the context of current developments and possibilities. These tools help to model the underlying semantic structure of a domain for purposes of information retrieval, knowledge discovery, language engineering, and the Semantic Web. The workshop provides an opportunity to discuss projects, research and development activities, evaluation approaches, lessons learned, and research findings. A further objective is to systematically engage in discussions in common areas of interest with selected related communities and to investigate potential cooperation. [more information]
Web Archive – An introduction to web archives for Humanities and Social Science research [Workshop]
Daniel Gomes, Jane Winters
We now have access to two decades of web archives, collected in different ways and at different times, which constitute an invaluable resource for the study of the late 20th and early 21st centuries. Researchers are only just beginning to explore the potential of these vast archives, and to develop the theoretical and methodological frameworks within which to study them, but recognition of that potential is becoming ever more widespread. This workshop seeks to explore the value of web archives for scholarly use, to highlight innovative research, to investigate the challenges and benefits of working with the archived web, to identify opportunities for incorporating web archives in learning and teaching, and to discuss and inform archival provision in all senses. [more information]
Multi-domain Research Data Management: from metadata collection to data deposit [Workshop]
Ângela Lomba, João Aguiar Castro
Framed by the many initiatives pushing for Open Science, the Research Data Management workshop at TPDL/DCMI 2018 will offer participants an informal venue to benefit from and share experience with domain experts, dealing with practical data management issues, and to explore RDM open-source tools. Researchers are expected to participate and reflect on aspects relevant to the Open Science policies. The workshop is organized in two sessions. The first is dedicated to domain-specific challenges and perspectives, based on presentations and a round-table for a participated discussion. The second is a hands-on event, with participants collaborating  in a field experiment where they collect metadata with LabTablet, and then synchronize it with the Dendro data organization platform.
Internet of Things Workshop: Live Repositories of Streaming Data [Workshop]
Artur Rocha, Alexandre Valente Sousa, Joaquin Del Rio Fernandez, Hylke van der Schaaf
The workshop focuses on demonstrating a set of standards and good practices used in the Internet of Things and discussing how they can be used to leverage F.A.I.R. evidence-based Science. Implementations based on standards, such as the OGC Sensor Observation Service or the OGC SensorThings API, and well established open frameworks, such as the FIWARE, will be demonstrated. Participants will be given the opportunity to try out such tools and take part in moderated panels. [more information]
Metadata for Manufacturing [Workshop]
Ana Alice Baptista, João P. Mendonça, Paula Monteiro
The use of Linked Data (LD) in manufacturing has many potentialities for the diversity of products requiring technical description and interrelationship, either within the same sector or between industrial sectors or between manufacturing and other sectors of activity as, for example, logistics and trade. An obvious use is, for example, in the catalogs of parts or end products. The existence of this type of information in RDF potentially facilitates not only business-to-business but also business-to-consumer relationships, by making detailed searches, comparisons and product relationships across the Web much faster and more reliable. The potential use of LD principles and technologies in manufacturing goes well beyond catalogs. Business-to-business data sharing requires interoperability and independence of proprietary formats. If there are cases where there are standards that ensure it, others there are where the standards do not exist or are not sufficient to guarantee semantic interoperability without endangering industrial property rights. DCMI, as a central worldwide entity in the topic of metadata has a leading role in all Linked Data developments. It cannot, therefore, fail to keep up with, and in some cases even lead, the developments related to metadata in manufacturing. This special session is intended as a seed for the creation of a community or Special Interest Group of metadata for manufacturing within DCMI.
Tutorials and Hands-on Sessions
Linked Data Generation from Digital Libraries [Tutorial]
Anastasia Dimou, Pieter Heyvaert, Ben Demeester
Knowledge acquisition, modeling and publishing are important in digital libraries with large heterogeneous data sources to construct knowledge-intensive systems for the Semantic Web. Linked Data increases data shareability, extensibility and reusability. However, using Linking Data, as a means to represent knowledge, has proven to be easier said than done! During this tutorial, we will elaborate the importance of semantically annotating data and how existing technologies facilitate the generation of their corresponding Linked Data. We will introduce the [R2]RML, language(s) to generate Linked Data from heterogeneous data and non-Semantic Web experts will annotate their data with the RMLEditor which allows all underlying Semantic Web technologies to be invisible. In the end, participants, independently of their knowledge background, will have model, annotate and publish some Linked Data on their own! [more information]
Research the Past Web using Web archives [Tutorial]
Daniel Gomes, Daniel Bicho, Fernando Melo
The Web is the largest source of public information ever built. However, 80% of the web pages disappear or are changed to a different content within 1 year. The main objectives of this tutorial provided by the Arquivo.pt team are to motivate to the pertinence of web archiving, present use cases and share recommendations to create preservable websites for future access. The tutorial introduces tools to create and explore web archives and presents methods and technologies to develop web applications that automatically access and process information preserved in web archives, for instance using the Wayback Machine, Memento Time Travel protocol or the Arquivo.pt API. [more information]
Europeana hands-on session [Tutorial]
The Europeana REST API allows you to build applications that use the wealth of Europeana collections drawn from the major libraries, museums, archives, and galleries across Europe. The Europeana collections contain over 54 million cultural heritage items, from books and paintings to 3D objects and audiovisual material, that celebrate over 3,500 cultural institutions across Europe. Over the past couple of years, the Europeana REST API has grown beyond its initial scope as set out in September 2011, into a wide range of specialized APIs. At the moment, we offer several APIs that you can use to not only get the most out of Europeana but also to contribute back. This tutorial session will walk you through the wide range of APIs that Europeana now offers, followed by an hands-on session where you will be able to experience first hand what you can do with it. [more information]
DCMI Meetings
DCMI Governing Board Meeting [Closed meeting]
This is the annual meeting of the DCMI Governing Board. This is a closed meeting.
DCMI Open Community meeting [Open meeting]
This meeting is intended to allow anyone from the DCMI community to bring ideas for discussion. All are invited. The meeting will be facilitated in a very informal manner, in an unconference style, so bring your idea!.

Important Links

Twitter Stream

Using hashtag: #dcmi18