DCMI: Papers, Presentations, Sessions, Posters, Workshops, Tutorials and Hands-on sessions

Keynotes

Keynote by Young Man Ko: Extracting ontologies from terminological databases [keynote]
Young Man Ko

A structural definition-based terminology defines terms on the basis of properties that are structured by conceptual categories (classes). When a structural definition-based terminology is extracted from a relational database into RDF, inference rules can be generated for use in complex semantic search by SPARQL query. Complex SPARQL queries that leverage these rules can yield better results than simple keyword queries, reflecting the logical combination of semantically related terms. With their basis in ontologies so generated, structural definition-based terminologies can be used to index databases for retrieval and to mine informal big data through the application of well-defined semantic concepts.

Keynote by Karen Coyle: All the Books [keynote]
Karen Coyle

Since the days of the fabled Library of Alexandria there have been efforts to gather a complete collection of recorded knowledge. While various efforts have paid attention to quality, coverage and redundancy, little has been developed about how -- or if -- these warehouses of data can best serve knowledge seekers. Coyle will take a long view of the history of our "hunting and gathering" of information sources, including the difficulty of defining either ALL or BOOKS. While no easy solution to user service can be proposed, she will argue for a human-centered rather than a thing-centered approach.

Keynote by Javed Mostafa: The Coming Age of Information Inversion: When Information Searches for “would be” Seekers [keynote]
Javed Mostafa

“There are too many books. They are being produced every day in torrential abundance. Many of them are useless and stupid; their existence and their conservation is a dead weight upon humanity.” The latter is a quote from the Spanish philosopher José Ortega Y Gasset’s famous speech titled “Mission of the Librarian” which he gave about 85 years ago. As we all know, the situation has not changed and in fact it has grown worse. About a dozen years ago Jensen et al. published a nice study which showed that even specialists, in certain areas of biomedicine, upon spending a whole lifetime’s worth of effort may not finish reading all the relevant articles in their specific areas of research. Taking inspiration from the dictum of a “double-edged sword”, the information technology which contributed toward acceleration of information production should be looked upon as a potential source for solutions. However, if we rely on information technology to only improve how we find information, we would grossly underleverage its real potential. With recent advances in IT, there is an opportunity to enhance the full scholarly information life-cycle: seeking, producing, disseminating, and using scholarly information. Discussing the various ways to take advantage of IT to support the full scholarly information life-cycle may take too long and it may be out-of-scope in this forum. Hence, I will primarily focus on seeking and producing scholarly information. Specifically, four areas will be covered: 1) Machine-assisted information discovery, authentication, and validation, 2) Cyberinfrastructure and scientific instrument interfacing to automate document and data production, 3) Documents as information agents and document-agent communities, and finally, 4) Communication in next-generation scholarly ecosystems where scholars and documents engage in dialogs and even debates. Wherever appropriate, I will point out relevant R&D activities conducted by researchers in my laboratory and other researchers around the world.

Session: Metadata analysis and assessment (1)

Toward A Metadata Activity Matrix: Conceptualizing and Grounding the Research Life-cycle and Metadata Connections [paper (short)]
Sonia Pascua, Kai Li and Jane Greenberg

How metadata is involved in the data-driven scientific practice is an important means to evaluate its values, the goal underlying the concept of metadata capital. In this work-in-progress paper, we propose a research project aiming to examine how metadata activities are embedded in research and data activities, as represented in research and data lifecycle models. As a first step of this project, we identify research and data lifecycle models and that best fit the scope of this project and offer some higher-level mapping among research activities, data processes, and metadata activities. This work offers a solid framework for the next step of this project to better understand the real-world values of metadata works and outputs.

Strategies and Tools for Metadata Migration Analysis and Harmonization [paper (short)]
Anne Washington, Annie Wu, Santi Thompson, Todd Crocken, Leroy Vallejo, Sean Watkins and Andrew Weidner

The University of Houston (UH) Libraries, in partnership and consultation with numerous institutions, was awarded an Institute of Museum and Library Services (IMLS) National Leadership/Project Grant to support the creation of the Bridge2Hyku (B2H) Toolkit. Research shows that institutions are inclined to switch from proprietary digital systems to open source digital solutions. However, content migration from proprietary systems to open source repositories remains a barrier for many institutions due to lack of tools, tutorials, and documentation. The B2H Toolkit, which includes migration strategies and use cases as well as tools for transitioning from CONTENTdm to Hyku, acts as a comprehensive resource to guide the migration practitioners in migration planning, metadata analysis and harmonization and to facilitate the repository migration process. This paper focuses on how the toolkit’s metadata guidelines and migration tools aid in migration planning, metadata analysis, metadata application profile development, metadata harmonization, and bulk ingest of digital objects into Hyku.

Using Metadata Record Graphs to Understand Digital Library Metadata [paper (full)]
Mark Phillips, Oksana Zavalina and Hannah Tarver

Digital collections in cultural heritage institutions are increasingly digitizing physical items, collecting born-digital items, and making these resources available online. Metadata plays crucial role in the discovery and management of these collections, which makes it important to identify areas of metadata improvement. A number of frameworks and associated metrics support metadata evaluation. The majority of these metrics make use of record-centered information, such as counts of metadata elements and occurrences of data values within a collection. There has been little research into the use of traditional network analysis to understand the connections between metadata records created by shared values, such as subject or creators. The goal of research reported in this paper is to investigate potential uses of network analysis and to determine which metrics hold the most promise in effective assessment of metadata. We introduce the Metadata Record Graph and analyze how it can be used to better understand various-sized collections of metadata.

Session: Metadata in Specific Contexts

Japan Search RDF Schema: a dual-layered approach to describe items from heterogeneous data sources [paper (short)]
Daichi Machiya, Tomoko Okuda and Masahide Kanzaki

The National Diet Library, Japan (NDL), with support by Xenon Limited Partners, has designed a new metadata schema based on the RDF model while developing a national platform for metadata aggregation and sharing, "Japan Search". Japan Search collects metadata from libraries, museums, archives, and research institutions across the country, and provides an integrated search service as well as APIs (SPARQL Endpoint and REST-API). The aim of this paper is to introduce the new schema, highlighting its dual-layered data model and the normalization of temporal (When), spatial (Where), and agential (Who) information provided in the source data.

Capturing Research Output in the Field of Anthropology: Metadata Design and Lesson Learned [paper (short)]
Sittisak Rungcharoensuksri and Wachiraporn Klungthanaboon

To advocate open science and knowledge development, the Princess Maha Chakri Sirindhorn Anthropology Centre (SAC) recognizes the significance of collocation of research outputs funded by the SAC for the public use. The SAC’s Research Database (http://www.sac.or.th/databases/sac-research/index.php) was developed and launched in March 2019 to provide free access to digital full-text research outputs under the creative common license (CC-BY-NC-ND 3.0). This database was designed for database administratorsSAC’s staffs and the general publicpublic by taking usability into account. Therefore, the usability and interoperability are taken into consideration when selecting the metadata scheme. The Dublin Core™ Metadata Element Set was chosen with some modified element refinements for the SAC’s Research Database. This paper presents the lesson learned from the development of the research database. Ultimately, this paper may shed some light on the application of metadata usage from the anthropology researchers to public users in the field of anthropology.

Session: Teaching Information Science (1)

Teaching Information Science [special session]
Kai Eckert, Magnus Pfeffer and Marcia Zeng

In this special session, we will discuss technological and (meta) data aspects of information science from a teacher's perspective. The topic will be addressed on different levels:

Big Picture:

Information science programs around the world adapt to new challenges and societal and technological developments. Information science is complemented in many schools by data science, computer science, or information design. If these are separate programs, how do they interact with the information science program? Otherwise, how much of these fields can/need to be incorporated in information science programs? What can be left out to make room for these new topics?

Computer Science / Data Science / Programming in IS:

Many programs offer now at least an introduction to programming, be it as an optional course rather at the end of the program or a required course in the first semester. How are these courses taught? What can be expected from current and future IS students? What are the goals of these courses, and what are the results? What theoretical background (e.g., algorithms, math) is provided? What teaching materials are used?

Teaching Metadata:

What is the current state in the broad metadata discipline. From data modelling to implementation, from metadata scheme creation to cataloguing standards and best practices: how do technological advancements affect lectures on metadata? The session will not necessarily address all these topics in-depth, it is also possible to shift the focus to other aspects of teaching. It all depends on the participants, this is first and foremost about exchange of experiences and potential collaboration on future developments.

 Speakers: 

Sam Oh, Sungkyunkwan University
Kai Eckert, Stuttgart Media University
Dan Wu, Wuhan University
Marcia Zeng, Kent State University
Magnus Pfeffer, Stuttgart Media University
Wei Fan, Sichuan University
Ruhua Huang, Wuhan University
Tom Baker, DCMI

Session: Metadata Application Profiles: Current initiatives

Application Profiles: Discussion of Current Initiatives [panel]
Thomas Baker and Karen Coyle

Application Profiles: Discussion of Current Initiatives

This will be a highly interactive audience and panel discussion, using as its basis some key questions that have arisen among developers and users of application profiles. Attendees should also bring questions that they have to the discussion. No prerequisities, all are welcome.

Background:

The Dublin Core™ Metadata Initiative has long promoted the notion of semantic interoperability on the basis of shared global vocabularies, or namespaces, selectively used and constrained for specific purposes in application profiles. A DCMI Application Profiles Interest Group that was convened in April 2019 aims at creating a core model for simple application profiles for use in tools and workflows to help author application profiles for the most common, straightforward use cases.

Although there is no existing standard for creating and sharing application profiles, there are a number of initiatives taking place in this area and many profiles already in use. This session will be a discussion emphasizing active projects creating and implementing application profiles, in particular looking at the areas that need further development to increase the utility of application profiles on the open web. We will invite participants with who are actively working in this area for bibliographic data (BIBFRAME, University of Tsukuba) as well as profiles from other communities, such as open government data (DCAT). The discussion will begin with a few key questions that we will ask the speakers to address in introductory remarks to the group. The panel will then take audience questions and comments.

Questions to be posed to the panelists include:

- Briefly describe what an "application profile" is in your community. - What tools, if any, does your initiative have to help people create and publish application profiles? - What have you found to be the greatest barriers to the creation of application profiles?

Session: Linked [meta]Data

Remodeling Archival Metadata Descriptions for Linked Archives [paper (full)]
Brian Dobreski, Jaihyun Park, Alicia Leathers and Jian Qin

Though archival resources may be valued for their uniqueness, they do not exist in isolation from each other, and stand to benefit from linked data treatments capable of exposing them to a wider network of resources and potential users. To leverage these benefits, existing, item-level metadata depicting physical materials and their digitized surrogates must be remodeled as linked data. A number of solutions exist, but many current models in this domain are complex and may not capture all relevant aspects of larger, heterogeneous collections of media materials. This paper presents the development of the Linked Archives model, a linked data approach to making item-level metadata available for archival collections of media materials, including photographs, sound recordings, and video recordings. Developed and refined through an examination of existing collection and item metadata alongside comparisons to established domain ontologies and vocabularies, this model takes a modular approach to remodeling archival data as linked data. Current efforts focused on a simplified, user discovery focused module intended to improve access to these materials and the incorporation of their metadata into the wider web of data. This project contributes to work exploring the representation of the range of archival and special collections and how these materials may be addressed via linked data models.

A Case Study of Japanese Textbook Linked Open Data: Publishing a Small Bibliographic Collection from a Special Library [paper (short)]
Yuka Egusa and Masao Takaku

Japanese Textbook Linked Open Data (LOD) is an LOD dataset of bibliographic and educational information that has been organized over the years by the Library of Education at the National Institute for Educational Policy Research. The dataset consists of bibliographic information for 7,548 volumes of Japanese textbooks authorized from 1992 to 2017, and provides 219,018 Resource Description Framework (RDF) triples as of April 2019. This paper reports a case study of the development and publication of Japanese Textbook LOD.

Assessing BIBFRAME 2.0: Exploratory Implementation in Metadata Maker [paper (short)]
Brinna Michael and Myung-Ja Han

As interest in linked data grows throughout the cultural heritage community, it is necessary to critically assess of existing tools for conversion and creation of linked data “records” and to explore new avenues for creating and encoding data using existing frameworks. This paper discusses the BIBFRAME 2.0 model and current Library of Congress conversion specifications from MARC21 through the process of designing and implementing an adapted, minimal-level conversion framework into the cataloging web application, Metadata Maker. In the process of assessment, we identified and addressed solutions for three key structural issues resulting from the Library of Congress conversion specifications: duplicated data, pervasiveness of empty nodes, and prevalence of literal data values over URIs. Additionally, we address concerns with how the BIBFRAME 2.0 model currently conceptualizes Work and linked data as a static “record.”

Session: Metadata Supporting Digital Humanities

The Role of Metadata in Supporting Digital Humanities [panel]
Marcia Zeng, Shigeo Sugimoto, Koraljka Golub, Shu-Jiun Chen, Lala Hajibayova and Wayne de Fremery

The Role of Metadata in Supporting Digital Humanities

Digital Humanities (DH) is becoming widely recognized as a mainstream academic field. Reported cases of digital humanities (DH) research activities include developing metadata and technology standards to model and represent humanities documents (texts, visual art, architecture, sculpture, etc.), and using such standards to develop digital scholarly editions and models of humanities documents. In this session, the panelists will demonstrate the role of metadata in supporting digital humanities in the context of real cases across cultures, domains, resource types, historical periods, and digital applications. It will also speculate on the roles metadata might play.

Marcia Zeng Introduction
Shigeo Sugimoto Modeling Culture – a perspective from digital archives and metadata
Sophy Shu-Jiun Chen Linked Data for Digital Scholarship- the Cases of Chinese Rare Books in Academia Sinica
Koraljka Golub Subject metadata for humanities journal articles: Indexing consistency between a local repository and an external bibliographic database
Lala Hajibayova Deconstructing User-generated Vocabularies: Reliable, Unreliable, Or …?
Wayne de Fremery, Data as Metadata—Metadata as Data. The Role of Metadata in Supporting Digital Humanities

Session: Working with Wikidata

Wikidata’s linked data for cultural heritage digital resources: An evaluation based on the Europeana Data Model [paper (full)]
Nuno Freire and Antoine Isaac

Wikidata is a data source with many potential applications, which provides its data openly in RDF. Our study aims to evaluate the usability of Wikidata as a linked data source for acquiring richer descriptions of cultural heritage digital objects within the context of Europeana, a data aggregator from the cultural domain. We want to automatize such data acquisition as much as possible. Specifically, we aim to crawl and convert Wikidata using the standard approaches and operations developed for the (Semantic) Web of Data, i.e. using technologies like linked data consumption and RDF(S)/OWL ontology expression and reasoning. We also seek to re-use already developed “semantic” specifications, such as conversions to and from generic data models like Schema.org and SKOS. We have developed an experimental set-up and accompa-nying software to test the feasibility of this approach. We conclude that Wikidata’s linked data is able to express an interesting level of semantics for cultural heritage, but quality can still be improved and a human operator still must assist linked data applications to interpret Wikidata’s RDF.

Linked Open Data for Subject Discovery: Assessing the Alignment Between Library of Congress Vocabularies and Wikidata [paper (full)]
Eunah Snyder, Lisa Lorenzo and Lucas Mak

Linked open data (LOD) has long been touted as a means to enhancing discovery of library resources through the use of robust links between related items and concepts. Recently, libraries have begun to experiment with LOD sources such as Wikidata and DBpedia to harness user-contributed resources and enhance information displayed in library discovery systems. The Michigan State University Libraries (MSUL) Digital Repository Team has embarked on a project to display contextual information from Wikidata and DBpedia in “knowledge cards” (informational pop-up windows) alongside subject headings with the goal of providing users with more information on items in the digital repository. This paper will briefly describe this project and outline a quality analysis initiative meant to evaluate linkages between Library of Congress Subject Heading (LCSH) and Wikidata as well as the results of this analysis. It will also address a number of challenges encountered in terms of mapping between different controlled vocabularies. Finally, it will conclude with possible next steps for improving the accuracy of knowledge cards and the LOD that supports them.

Using Wikidata as Work Authority for Video Games [paper (short)]
Kazufumi Fukuda

Video games have a short but rich history. Therefore, they have been gaining popularity as cultural heritage and research material. Several studies have analyzed the metadata and cataloging of video games. However, the research on its implementation is limited. Hence, we investigate the practice of cataloging video games at the Center for Game Studies, Ritsumeikan University (RCGS) in this study and examine the effectiveness of data utilization from Wikidata to construct an authority of works for video games. We accomplished this by associating the distribution package with Wikipedia and Wikidata. Consequently, records of works covering approximately half of the video games were created. However, the problem of uniformity of granularity and completeness was found in these data based on Wikipedia's culture and policies. Thus, data enrichment is difficult owing to the non-uniform granularity of bibliography with Wikidata. In contrast, the cost of data creation is effective. Furthermore, the external link ID is highly effective in enhancing the value of catalog as Linked Open Data (LOD). It is also evident that using published authority data is useful for data integration but Wikidata has some problems with its features. There is a need to consider the function and purpose of the catalog as linked data instead of a separate catalog. Thus, the adaptation of Wikidata for catalogs needs to be designed accordingly as linked data.

Using Wikidata to Provide Visibility to Women in STEM [paper (short)]
Mairelys Lemus-Rojas and Yoo Young Lee

Wikidata is an open knowledge base that stores structured linked data. Launched on October 29, 2012, Wikidata already contains over 56 million items (“Wikidata:Statistics,” n.d.) but its data reveal a noticeable and prevalent gender disparity. In an effort to contribute to the growth and enhancement of women entries in Wikidata, the Indiana University-Purdue University Indianapolis (IUPUI) University Library and the University of Ottawa Library collaborated to embark on pilot projects that broaden the representation and enhance the visibility of women in STEM (Science, Technology, Engineering, and Mathematics). In this article, we share the methods used at both institutions for collecting faculty data, batch ingesting data using external tools, as well as mapping archival data to existing Wikidata properties. We also discuss the challenges we faced during our pilot projects.

Session: Metadata Analysis and Assessment (2)

Semantic Metadata as Meaning Making: Examining #hashtags and Collection Level Metadata [paper (short)]
Hollie White, Leisa Gibbons and Eileen Horansky

Memory institutions and other organizations interested in preserving social media data are using a variety of collection level metadata to represent those materials. The aim of this paper is to start a dialogue within the metadata community about how metadata professionals can describe social media collections in better ways to ensure that the semantic complexity of hashtags remain intact at the collection level. This paper explores how hashtags manifest semantic metadata and how its expression is formally described at the collection level. A study was conducted using two datasets. The first dataset on hashtags as defined by professional literature was examined and categorized using thematic analysis. The second dataset collected metadata from a selection of Document the Now Twitter datasets and was categorized using Gilliland’s (2016) five categories of metadata. Findings and discussion delve into the use of collection level metadata to describe social media content and metadata surrogacy as weakening semantic meaning.

A Survey of Metadata Elements for Provenance Provision in China Open Government Data Portals [paper (short)]
Chunqiu Li, Yuhan Zhou and Kun Huang

The open government movements facilitate the transparency and sharing of government data. Provenance of open government data (OGD) describes source information related to who, how, where, when and other information over the lifecycle of OGD. Provenance of OGD should be tracked for high-quality and trustworthiness of OGD. Currently, OGD portals provide provenance through general metadata elements, such as creator, provider, creation date, publication date, issued time. In China, local OGD portals define their own metadata profiles. However, these metadata elements in different OGD portals vary and there is no clearly and well-defined provenance description scheme for OGD in China. Therefore, this paper is purposed to survey the current provision situation of provenance metadata elements in 42 China OGD portals and conduct the unification of provenance elements based on the survey results. This research is meaningful to facilitate formal description of provenance information in China OGD portals.

The Significant Role of Metadata for Data Marketplaces [paper (short)]
Sebastian Lawrenz, Priyanka Sharma and Andreas Rausch

With the shift to a data-driven society, data trading takes on a completely new significance. In the future, data marketplaces will be equivalent to other electronic commerce platforms such as Amazon or eBay. Just like any other online marketplace a data marketplace is a platform that enables convenient buying and selling of products- in this case “data”.

Metadata is data about data. Metadata plays a significant role in data trading, as it serves as an orientation for all involved parties in the data marketplace. A seller who wants to sell their data on the marketplace needs metadata to describe the selling offer, and the buyer can use it to search and identify relevant data.

This paper outlines the significance of metadata in data trading on a data marketplace and classifies the levels of metadata. Moreover, in data trading metadata has also a significant role in determining the data quality. In this paper we also discuss the role of metadata in terms of data quality.

Session: Metadata Application Profiles: Development and Implementation

Yet Another Metadata Application Profile (YAMA): Authoring, Versioning and Publishing of Application Profiles [paper (full)]
Nishad Thalhath, Mitsuharu Nagamori, Tetsuo Sakaguchi and Shigeo Sugimoto

Metadata Application Profiles are the elementary blueprints of any Metadata Instance. Efforts like the Singapore Framework for Dublin Core™ Application Profiles define the framework for designing metadata application profiles to ensure interoperability and reusability. However, the number of publicly accessible, especially machine actionable application profiles are significantly lower. Domain experts find it difficult to create application profiles, considering the technical aspects, costs and disproportionate incentives. Lack of easy-to-use tools for Metadata Application Profile creation is also a reason for lack of larger reach. This paper proposes Yet Another Metadata Application Profile (YAMA) as a user-friendly interoperable preprocessor for creating, maintaining and publishing Metadata Application Profiles. YAMA helps to produce various formats and standards to express the Metadata Application Profiles, change logs, and different versions, with an expectation of simplifying Metadata Application Profile creation process for domain experts. YAMA includes an integrated syntax for recording application profiles as well as changes between different versions. A proof of concept toolkit, demonstrating the capabilities of YAMA is also being developed. YAMA boasts a human readable yet machine actionable syntax and format, which is seamlessly adaptable to modern version control workflows and expandable for any specific requirements.

Singapore's Moments of Life: A Metadata Application [paper (full)]
Kathy Choi and Haliza Jailani

As part of Singapore's smart nation initiative, Moments of Life (MOL) was created as a whole of government mobile application to serve citizens' needs better through technology. A strategic project under the Smart Nation and Digital Government Office, National Library Board Singapore (NLB) was invited to develop a metadata framework for the app. From parenting to active ageing and end of life needs, the app consolidates government services for important milestones in a citizen's life. E-government metadata standards and initiatives based on Dublin Core™ (DC) started as early as 2000s. The European Committee for Standardization CEN/ISSS has provided a methodology in developing an e-government metadata element set. This paper starts with a review of DC e-government metadata standards and initiatives, and the latest application of metadata for digital government. Thereafter, it presents how NLB applied its methodology to develop an application profile and a multi-faceted taxonomy. As a multi-cultural society with 4 official languages, a common vocabulary is important for data to be shared, re-used and searched across agencies by citizens. This will not only help citizens to search for information more effectively, but it will ready MOL content for structured data implementation for Internet discovery. The challenges faced, features of the mobile app such as profiling and filtering, global search and faceted navigation are effectively achieved with the use of Dublin Core™ as the metadata schema for supporting MOL.

Workshops and Tutorials

Networked Knowledge Organization Systems (NKOS) [workshop (full day)]
Joseph Busch and Marcia Zeng

The program for this Workshop will include:

Keynote Presentation

Jian Qin. Paradigmatic Similarities in Knowledge Representation between AI and Ontological Systems.

Submitted and reviewed presentations

Joseph Busch. Developing a Health Policy Domain Model for the Robert Wood Johnson Foundation.
Marcia Zeng and Julaine Clunis. Functional Metrics for Linked Open Data (LOD) KOS Products.
Andreas Koller. How to Extract Hidden Information and 'Aboutness' from Text Using SKOS, Ontologies, Corpus Analysis and Linked Data.
Sonia Pascua, Jane Greenberg, Peter Logan, and Joan Boone. SKOS of the 1910 Library of Congress Subject Heading for the Transformation of the Keywords to Controlled Vocabulary of the Nineteenth-Century Encyclopedia Britannica.
Minjuan Liu and Yao Lu. A Contrastive Study of Agricultural Thesaurus.

Submitted and reviewed short presentations

Vânia Mara Alves Lima, Cibele de Araújo Camargo Marques dos Santos, and Artur Simões Rozestraten. The Arquigrafia Project.
Ziyoung Park, Claudio Gnoli, and Daniele P. Morelli. The Second Edition of the Integrative Levels Classification: Evolution of a KOS.
Ziyoung Park, Hosin Lee, Seungchon Kim, Sungjae Park, Dasom Jung, Seunghee Son, and Yoonwhan Kim. Improving Archival Records of Traditional Korean Performing Arts in a Semantic Web Environment.
Hyewon Lee, Soyoung Yoon, and Ziyoung Park. A Digital Curation Model Focused on Semantic Enrichment.

See the 2019 NKOS Workshop page for more details.

Introduction to Jupyter Notebooks [tutorial (full day)]
Kai Eckert and Magnus Pfeffer

Jupyter Notebook is an open source web application for creating and sharing “live documents” that can contain code and the results from its execution besides traditional document elements like text or images. Originally being developed as part of the IPython project, it is now independent of Python and supports a long list of different programming languages, including JavaScript, Ruby, R and Perl.

These live documents are uniquely suited to create teaching materials and interactive manuals that allow the reader to make changes to program code and see the results within the same environment: program outputs can be displayed, visualisation graphics or data tables can be updated on-the-fly. To support traditional use cases, static non-interactive versions can be exported in PDF, HTML or LaTeX format.

For data practitioners, Jupyter Notebooks are ideal to perform data analyses or transformations, e.g., to generate Linked Open Data, where the workflow documentation is part of the implementation. Single lines of code can be added or changed and then executed without losing the results of prior parts of the code. Visualizations can be generated in code and are directly embedded in the document. This makes prototyping and experimenting highly efficient and actually a lot of fun.

Finally, Jupyter Notebooks are an ideal platform for beginners, as they can execute code line by line and immediately see how changes affect the result.

This workshop requires no prior knowledge of Jupyter Notebooks or the Python programming language; only basic programming and HTML/Markdown knowledge is required.

Agenda:

Part I: Introduction
- Local installation of the necessary programming environment
- Using existing documents
- Creating documents with rich content
- Notebook extensions
Part II: Case studies
- Using Jupyter Notebook in teaching data integration basics
- Using Jupyter Notebook to develop, test and document a data management workflow with generation of RDF
Part III: Advanced topics
- Server installation and use
- Version control
- Using different language kernels

Professor Kai Eckert

Kai Eckert is professor for Web-based Information Services at Stuttgart Media University and co-director of the Institute for Applied Artificial Intelligence. His research involves the application of natural language processing and artificial intelligence in fields including cultural heritage, open science and smart cities. Recent projects include ConfRef.org, a collaboration with SpringerNature to create an open dataset on scientific conferences; JudaicaLink, a knowledge graph for Jewish studies; and CAIUS, a collaboration with the University of Mannheim to investigate consequences of artificial intelligence on urban societies. Kai teaches in the Information Science program of Stuttgart Media University, where he develops new courses to introduce technical concepts.

Professor Magnus Pfeffer

Magnus Pfeffer is professor for Information Management at Stuttgart Media University and the dean of studies (program manager) for the information sciences program. His research is focussed on metadata management, ontologies and automatic classification. His latest research project "Japanese Visual Media Graph" is an international collaboration between researchers from the fields of media studies, Japan studies and information science to create a comprehensive database of Japanese Visual Media using data accumulated by enthusiast communities.

Wikidata as a hub for the Linked Data cloud [tutorial (full day)]
Tom Baker, Andra Waagmeester and Joachim Neubert

This tutorial will help people use Wikidata as a "linking hub" – a starting point for exploring datasets by leveraging links both to Linked Data repositories and to datasets outside of the Linked Data cloud. 1. Using and querying Wikidata (90m). Introduction to Wikidata and its data model. Presentation of Wikibase, the infrastructural platform both for Wikidata itself and for independently maintained databases. Demonstration of methods for loading data into Wikidata and keeping it up-to-date. 2. Wikidata as a hub for the Linked Data cloud (90m). Comparison of the Wikidata data model and Linked Data model. The use of identifiers for linking out to external resources and their connection to Linked Data URIs. Current state of Wikidata as a linking hub. Demonstration of integrated access to Wikidata and other datasets using federated SPARQL queries. 3. Wikidata tools and hands-on exercises - Part 1 (90m). Demonstration of how the coherence of data and precision of search can be improved by creating a semantic data model linked to commonly used vocabularies such as Dublin Core™. Presentation of applications that use Wikidata as a back-end source of data and provide interfaces for formulating queries and contributing content. Participants will split into groups of two or three people each to work on exercises. 4. Wikidata tools and hands-on exercises - Part 2 (90m). Presentation of tools for creating links to datasets outside of Wikidata. Groups will work independently on exercises with help from the tutorial presenters.

Posters (peer-reviewed)

Analog/Digital LP Collection: linked metadata between a library discovery and digital collection platform [poster (peer reviewed)]
Marc Stoeckle and Ingrid Reiche (Presented by: Marc Stoeckle)

Posters ("work-in-progress")

Japanese Visual Media Graph: Providing researchers with data from enthusiast communities [poster (work in progress)]
Magnus Pfeffer and Martin Roth (Presented by: Magnus Pfeffer)

Deep Text Analytics with Knowledge Graphs and Machine Learning [poster (work in progress)]
Andreas Blumauer (Presented by: Andreas Koller)

Deep Semantic Annotation of Cultural Heritage Images [poster (work in progress)]
Xiaoguang Wang, Xu Tan, Ningyuan Song, Dave Clarke and Xiaoxi Luo (Presented by: Xu Tan)

Research on Smart Government metadata mapping based on DC metadata [poster (work in progress)]
Yunkai Zhang, Jie Ma, Mo Hu, Zhiyuan Hao and Yushan Xie (Presented by: )

Sharing the Outline of World Cultures and Outline of Cultural Materials: restructuring a legacy classification system, with a Korean studies history, for the semantic web [poster (work in progress)]
Douglas Black (Presented by: Douglas Black)

Exploring User-Generated Reviews Associated with Graphic Novels: Enriching or Impoverishing Metadata? [poster (work in progress)]
Lala Hajibayova (Presented by: Lala Hajibayova)

Aggregation of Regional Cultural Heritage Information in Japan [poster (work in progress)]
Taiki Mishima (Presented by: Taiki Mishima)

Meetings

DCMI Governing Board (closed meeting) [meeting]

This is a closed meeting of the DCMI Governing Board.

Open Community Meeting [meeting]

This is an open meeting - open to any participant at the conference - to provide an opportunity to make suggestions to DCMI about its activities (including, but not limited to, the conference). The meeting is free-form, with no set agenda. Bring your ideas!

Papers, Presentations, Sessions, Posters, Workshops, Tutorials and Hands-on sessions

Sponsors