innovation in metadata design, implementation & best practices

Metadata for Interoperability in the Global Corporate Environment: Global Corporate Circle DCMI 2004 Pre-Conference Workshop

Metadata for Interoperability in the Global Corporate Environment

Global Corporate Circle DCMI 2004 Pre-Conference Workshop

Sunday, October 10, 2004
Shanghai, China

This report was prepared by Erin Stewart (Microsoft Corporation).

A note on the linking conventions in this report: The names of people link to a PDF version of their PowerPoint presentation the first time they are mentioned in the report. The names of companies link to the company's web site.

Introduction

The 2003 Dublin Core Metadata Initiative International Conference ( http://dc2004.library.sh.cn/) featured the theme "Metadata Across Languages and Cultures." To complement this focus, the Global Corporate Circle Working Group offered a pre-conference workshop with expert panelists exploring Metadata for Interoperability in the Global Environment. The workshop addressed the metadata lifecycle--creation, management, and use--as it applies to enterprise applications and activities, with special focus on interoperability. The following report identifies key themes and insights, and suggests directions for future developments in practice and research.

Thematic Summaries

Level 1: DC for a Jumpstart: A Simple Schema that Improves Access and Compliance

Joseph Busch of Taxonomy Strategies found across the companies he surveyed for CEN (the European Committee for Standardization) that DC is useful as a foundational schema for straightforward ROIs (returns on investment) such as improved access and compliance with regulations. Jack Aaronson of The Aaronson Group pointed out that improved access can be achieved when metadata values are used in navigation design.

Siderean Software's Brad Allen, who also surveyed implementers, found the level of metadata usage to be low but growing, and that access rather than compliance is the compelling driver. Brad also noted two interesting aspects of the "access" benefit:

Level 2: DC for Integration of Content and/or Data

The difference between data and content is continuing to blur. As Brad Allen put it, the Web helped us learn how to turn anything into "content," so now we ask "what is out there in this world that I can describe this way?"

Joseph Busch noted that there is interest in integrating back office information (personnel data, etc.) traditionally stored in multiple repositories. This was affirmed by Eric Miller of the W3C, who commented that data integration and web architecture strategy was born in commercial organizations and asked, how can we join relational database data and make this as easy as it is to share documents?

Arthur Haynes of Siemens Business Systems Media (previously BBC Technology) presented an overview of the British Broadcasting Corporation's metadata implementation using a local data model called SMEF, which is DC-aware. Their goals were re-use and the integration of media assets-independent of technology and storage. He noted that the media industry is beginning to see and treat assets as data that can be managed. The approach being undertaken with the BBC is to maintain connections as necessary with external standards, such as DC, and to understand how to connect to new, developing standards such as TV-Anytime. One other interesting point Arthur made was that "project" was a metadata element they used for description. (See Additional Elements to Consider.) This allowed for a practical aggregation of assets as well as a convenient access point for users.

Jack Aaronson also talked about integration, highlighting a recently-completed project for Shop NBC. The goals of this project centered around more effective marketing: exploiting a multichannel strategy, implementing personalization, and optimizing the user experience.

Level 3: DC for Describing Complex, Interaction-Based, Ever-Changing Content

Jon Mason of education.au limited described the challenge of capturing metadata for learning objects, which are very slippery and dynamic. In this type of abstract space, a common terminology is foundational. MIT Open Knowledge Initiative is trying to model the complexity so the domain can start to describe processes effectively. The IEEE Learning Object Model standard, although based on DC, is very complex. Some of the issues arising from LOM work include identification and description of content vs. process vs. context, which can ultimately plague any complex metadata realm.

Jon also brought forward the question–what exactly is the "Enterprise"? Interoperability traditionally assumes that repositories are institutional; now we know that people are repositories, iPods are repositories, etc. Boundaries are becoming increasingly porous among domains, settings, and purposes. Jon concluded that DC is useful as a framework for interoperability but may not be complete.  Brad Allen countered that leveraging existing metadata (discussed more below) is the key to reducing complexity.

Implementation Approaches

Joseph Busch pointed out three distinguishing factors of an implementation: form-based metadata collection; distributed vs. centralized; automated vs. manual. Brad Allen added another to characterize the approaches or drivers for implementation: top-down (CEO mandate) vs. bottom-up (in collaborative environments) vs. portal-driven (for repository joining).

Refinements in Use: Document Type and Products/Services

Many corporations are extending the Core to include description of Document Type, Products and Services, and User Roles or Business Purpose according to Joseph Busch. Brad Allen added that a richer semantic for describing a business function or an executable process is more relevant than MIME type. (See Additional Elements to Consider.)

Vocabulary Sources

In the CEN survey, Joseph Busch and his partners found that controlled vocabularies for populating DC schema most commonly come from:

Eric Miller touched on the Creative Commons vocabulary and several products that are attempting to address vocabulary creation and management.

Lessons Learned

Use What You Have

Brad Allen stressed the importance of leveraging what is already available (e.g., creator, date, storage location) to describe content. He showed a way to create faceted navigation or guided filtering by use of existing metadata values, and noted an added benefit of users implicitly learning the "lingo" of a particular domain. Jack Aaronson and Brad Allen agreed that business rules (if, then) also can be applied to native metadata to create new, more useful metadata.

Brad stated that we need to shift the focus from monolithic taxonomies to smaller vocabularies, schema, and application profiles. Jack Aaronson echoed this and had a real-life example in Shop NBC. The site had been burdened by an enormous, complex polyhierarchical taxonomy to describe each potential combination of descriptive metadata (e.g., Jewelry – diamonds – rings as well as jewelry – rings – diamonds). The new model is to dynamically populate the site based on a product schema that is monohierarchical (e.g., users must choose one entry point such as Shop by Brand, and then are given options to refine the navigation experience hierarchically or through other elements such as Type). This is similar to the Amazon.com model.

Know Why Your Users are Searching (also known as Context)

Jack Aaronson stressed that a good user experience, regardless of domain, requires three things:

Knowledge of your offerings

Knowledge of your users: segments, interactions with your site, what they know about other users (see Audience under Additional Elements To Consider)

Knowledge of how to connect the above 2 in meaningful ways through metadata

He mentioned several projects and companies that have started down the path of using artificial intelligence to go beyond collaborative filtering. That technology is shallow in that it simply states a statistical probability of affinity based on two users' behaviors. A more advanced model Jack described accounts for why users are retrieving things – not just what they retrieved.

This idea has some parallel with Brad Allen and Joseph Busch's assertion that content "purpose" should be captured to allow retrieval in context. Knowing what your users intend to do with the information they find is essential in deciding how to metatag it. (See Additional Elements To Consider.)

Additional Elements To Consider

Content (or Application) Function

Several panelists emphasized that an overriding purpose of metadata in the corporate environment often is to get the user to the right LOB (line of business) content or application. It would be more useful to look at a Content (or Application) element as pivotal rather than Resource ID. The vocabulary would be based on associated business processes.

Project

Project may be a useful element for aggregating content, providing context to aid in retrieval, and to associating a funding source.

User Ratings

User Ratings help users qualify the resource, and both Brad Allen and Jack Aaronson highly recommended incorporation of this when available.

Audience

Joseph Busch heard that Role is required by companies to describe functions and tasks of people. This also is an important component of personalization. Role, referring to the user, would correspond – and be matched with – the Audience element to describe content.

Challenges

Moving the Talk Off of ROI

A comment by Eric Miller resonated with other panelists and participants: ROI arguments are only as persuasive as the management is receptive. Indicating a possible evolution from the flat world of ROI, Joseph Busch found business decision makers asking: What is the business purpose of the content? How does this fit in the budget? How does this piece of content fit into the bigger picture?

Working With Schemas

Arthur Hayes reported briefly on three challenges in managing multimedia assets: asset size, technology leading the solutions, and rights control. Joseph Busch added that encoding in "real life" also presents challenges, and that very real requirements for local schema and extensions can undermine interoperability due to registration obstacles.

The Date element can be problematic for corporate implementers according to Joseph Busch. He noted issues such as consistent granularity, selecting an appropriate point in the life cycle to capture, and formatting.

Plugging the Gap Between Portals, Search Engines, and Repositories

Brad Allen summarized the current situation: Metatagging process and tools are very disconnected; we cannot deliver vocabularies to a production environment.  Eric Miller discussed briefly some niche forays in this area; namely, Adobe XMP (Creative Suite), a productization of RDF (Resource Description Framework) and DC schema. He also talked about several open source applications that exploit the semantic web. These applications target expertise location, workflow improvement, personal productivity such as calendaring, interface customization (My Data My Way), and asset management. Jon Mason mentioned the E-Portfolio group as also involved in this space.

Brad also pointed out that people don't want to hardwire together their sites, back-end repositories, and search. For example, descriptive metadata to support search (or as made automatically available from repositories) should be programmatically available to leverage in portal navigation.

Conclusion

Many of the themes, lessons learned, and challenges we heard about are common to libraries, museums, governments, and corporate enterprises. Eric Miller provided a useful summation of this when he noted:

The Global Corporate Circle Working Group hopes to continue these useful dialogs among practitioners, web theorists, product developers, and business decision makers to as a path to realizing the vision of DCMI.