Using Dublin Core
|Is Replaced By:||http://dublincore.org/specifications/dublin-core/usageguide/2003-08-26/|
|Status of Document:||This is a DCMI Recommendation.|
|Description of Document:||This document is intended as an entry point for users of Dublin Core™. For non-specialists, it will assist them in creating simple descriptive records for information resources (for example, electronic documents). Specialists may find the document a useful point of reference to the documentation of Dublin Core, as it changes and grows.|
Table of Contents
- 1.1. What is Metadata?
- 1.2. What is the Dublin Core?
- 1.3. The Purpose and Scope of This Guide
- Syntax and Related Issues
- 2.1. HTML
- 2.2. RDF/XML
- 2.3. Metadata Contained in a Resource
- 2.4. Stand-Alone Metadata
- Basic Principles of Descriptive Elements
- 3.1. Element Content and Controlled Vocabularies
The Core Elements
- 5.1. Classes of Qualifiers
- 5.2. The Dumb-Down Principle
- 6.1 Generic Examples
- 6.2 Simple HTML Examples
- 6.3 Qualified HTML Examples
- 6.4 Simple RDF Examples (Forthcoming)
- 6.5 Qualified RDF Examples (Forthcoming)
- 6.6 Examples from other sources
1.1. What is Metadata?
Metadata has been with us since the first librarian made a list of the items on a shelf of handwritten scrolls. The term "meta" comes from a Greek word that denotes "alongside, with, after, next." More recent Latin and English usage would employ "meta" to denote something transcendental, or beyond nature. Metadata, then, can be thought of as data about other data. It is the Internet-age term for information that librarians traditionally have put into catalogs, and it most commonly refers to descriptive information about Web resources.
A metadata record consists of a set of attributes, or elements, necessary to describe the resource in question. For example, a metadata system common in libraries -- the library catalog -- contains a set of metadata records with elements that describe a book or other library item: author, title, date of creation or publication, subject coverage, and the call number specifying location of the item on the shelf.
The linkage between a metadata record and the resource it describes may take one of two forms:
- elements may be contained in a record separate from the item, as in the case of the library's catalog record; or
- the metadata may be embedded in the resource itself.
Examples of embedded metadata that is carried along with the resource itself include the Cataloging In Publication (CIP) data printed on the verso of a book's title page; or the TEI header in an electronic text. Many metadata standards in use today, including the Dublin Core™ standard, do not prescribe either type of linkage, leaving the decision to each particular implementation.
Although the concept of metadata predates the Internet and the Web, worldwide interest in metadata standards and practices has exploded with the increase in electronic publishing and digital libraries, and the concomitant "information overload" resulting from vast quantities of undifferentiated digital data available online. Anyone who has attempted to find information online using one of today's popular Web search services has likely experienced the frustration of retrieving hundreds, if not thousands, of "hits" with limited ability to refine or make a more precise search. The wide scale adoption of descriptive standards and practices for electronic resources will improve retrieval of relevant resources from the "Internet commons." As noted by Weibel and Lagoze, two leaders in the field of metadata development:
"The association of standardized descriptive metadata with networked objects has the potential for substantially improving resource discovery capabilities by enabling field-based (e.g., author, title) searches, permitting indexing of non-textual objects, and allowing access to the surrogate content that is distinct from access to the content of the resource itself." (Weibel and Lagoze, 1997)
It is this need for "standardized descriptive metadata" that the Dublin Core™ addresses.
1.2. What is the Dublin Core?
The Dublin Core™ metadata standard is a simple yet effective element set for describing a wide range of networked resources. The Dublin Core™ standard comprises fifteen elements, the semantics of which have been established through consensus by an international, cross-disciplinary group of professionals from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship.
Another way to look at Dublin Core™ is as a "small language for making a particular class of statements about resources" ( Baker, 2000). In this language, there are two classes of terms--elements (nouns) and qualifiers (adjectives)--which can be arranged into a simple pattern of statements. The resources themselves are the implied subjects in this language. In the diverse world of the Internet, Dublin Core™ can be seen as a "metadata pidgin for digital tourists": easily grasped, but not necessarily up to the task of expressing complex relationships or concepts.
The Dublin Core™ element set is outlined in Section 4. Each element is optional and may be repeated. Each element also has a limited set of qualifiers, attributes that may be used to further refine (not extend) the meaning of the element. The Dublin Core™ Metadata Initiative (DCMI) has defined standard ways to "qualify" elements with various types of qualifiers. A set of recommended qualifiers conforming to DCMI "best practice" is available, with a formal registry in process..
Although the Dublin Core™ favors document-like objects (because traditional text resources are fairly well understood), it can be applied to other resources as well. Its suitability for use with particular non-document resources will depend to some extent on how closely their metadata resembles typical document metadata and also what purpose the metadata is intended to serve. (Implementors interested in using Dublin Core™ for diverse resources are encouraged to browse the Dublin Core™ Projects pages for ideas on using Dublin Core™ metadata for their resources.)
Dublin Core™ has as its goals the following characteristics:
Simplicity of creation and maintenance
The Dublin Core™ element set has been kept as small and simple as possible to allow a non-specialist to create simple descriptive records for information resources easily and inexpensively, while providing for effective retrieval of those resources in the networked environment.
Commonly understood semantics
Discovery of information across the vast commons of the Internet is hindered by differences in terminology and descriptive practices from one field of knowledge to the next. The Dublin Core™ can help the 'digital tourist' -- a non-specialist searcher -- find his or her way by supporting a common set of elements, the semantics of which are universally understood and supported. For example, scientists concerned with locating articles by a particular author, and art scholars interested in works by a particular artist, can agree on the importance of a "creator" element. Such convergence on a common, if slightly more generic, element set increases the visibility and accessibility of all resources, both within a given discipline and beyond.
The Dublin Core™ Element Set was originally developed in English, but versions are being created in many other languages, including Finnish, Norwegian, Thai, Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish. The Special Interest Group on Dublin Core™ in Multiple Languages is coordinating efforts to link these versions in a distributed registry using the Resource Description Framework technology being developed by the World Wide Web Consortium ( W3C).
Although the technical challenges of internationalization on the World Wide Web have not been directly addressed by the Dublin Core™ development community, the involvement of representatives from almost every continent has ensured that the development of the standard considers the multilingual and multicultural nature of the electronic information universe.
While balancing the needs for simplicity in describing digital resources with the need for precise retrieval, Dublin Core™ developers have recognized the importance of providing a mechanism for extending the DC element set for additional resource discovery needs. It is expected that other communities of metadata experts will create and administer additional metadata sets. Metadata elements from these sets could be linked with Dublin Core™ metadata to meet the need for extensibility. This model allows different communities to use the DC elements for core descriptive information which will be usable across the Internet, while allowing domain specific additions which make sense within a more limited arena. Specific instructions for implementing such a model are currently under development.
1.3. The Purpose and Scope of This Guide
This document is intended to be an entry point for users of Dublin Core™. For non-specialists, it will assist them in creating simple descriptive records for information resources (for example, electronic documents, JPEG images, video clips). Specialists may find the document a useful point of reference to the documentation of Dublin Core, as it changes and grows.
The guide will show in a non-technical fashion how Dublin Core™ metadata may be used by anyone to make their material more accessible. This guide discusses the layout and content of Dublin Core™ metadata elements, how to use them in composing a complete Dublin Core™ metadata record, as well as how to qualify elements to support use by a wide variety of communities.
Another important goal of this document is to promote "best practices" for describing resources using the Dublin Core™ element set. The Dublin Core™ community recognizes that consistency in creating metadata is an important key to achieving complete retrieval and intelligible display across disparate sources of descriptive records. Inconsistent metadata effectively hides desired records, resulting in uneven, unpredictable or incomplete search results.
2. Syntax Issues
In this guide, we have chosen to represent Dublin Core™ examples in several different syntaxes, including: HTML ( the Web's Hypertext Markup Language format), RDF/XML (the Resource Description Framework using eXtensable Markup Language) and in a generic form (Element="value"). HTML provides an easily understood format for demonstrating Dublin Core's underlying concepts, but more complex applications using qualification may find that using RDF/XML makes more sense. When considering an appropriate syntax, it is important to note that Dublin Core™ concepts are equally applicable to virtually any file format, as long as the metadata is in a form suitable for interpretation both by search engines and by human beings.
"Encoding Dublin Core™ Metadata in HTML" (Kunze, 1999) provides guidance for using HTML with unqualified Dublin Core, whether the metadata be embedded in the resource or in a separate file.
HTML can also be used to express qualified Dublin Core, although there are limitations inherent in doing so. The current thinking on how this might best be accomplished is contained in the working draft: Recording qualified Dublin Core™ metadata in HTML meta elements.
RDF (Resource Description Framework) allows multiple metadata schemes to be read by humans as well as parsed by machines. It uses XML (EXtensible Markup Language) to express structure thereby allowing metadata communities to define the actual semantics. This decentralized approach recognizes that no one scheme is appropriate for all situations, and further that schemes need a linking mechanism independent of a central authority to aid description, identification, understanding, usability, and/or exchange.
RDF allows multiple objects to be described without specifying the detail required. The underlying glue, XML, simply requires that all namespaces be defined and once defined, they can be used to the extent needed by the provider of the metadata.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://media.example.com/audio/guide.ra"> <dc:creator>Rose Bush</dc:creator> <dc:title>A Guide to Growing Roses</dc:title> <dc:description>Describes process for planting and nurturing different kinds of rose bushes.</dc:description> <dc:date>2001-01-20</dc:date> </rdf:Description> </rdf:RDF>
This simple example uses Dublin Core™ by itself to describe an audio recording of a guide to growing rose bushes. With XML and RDF, Dublin Core™ can now be mixed with other metadata vocabularies. For example, the simple Dublin Core™ description above might be used alongside other vocabularies such as vCard that can describe the author's affiliation and contact information, or a more specialised "rose description" vocabulary that described the rose bushes in greater detail.
2.3. Metadata Contained in a Resource
Some implementations using Dublin Core™ have chosen to embed their metadata within the resource itself. This approach is taken most often with documents encoded using HTML, but is also sometimes possible with other kinds of documents. Simple tools have been developed to make provision of Dublin Core™ metadata within HTML encoded pages fairly easy. One such tool, DC.dot, extracts metadata information from an HTML document, and formats it so that it can be edited, then cut and pasted back into the HTML header of the original document.
2.4. Stand-Alone Metadata
Stand-alone metadata can exist in any kind of database, and generally provides a link to the described resource. This approach is likely to be most practical for many non-textual resources, and is increasingly used for text as well, primarily to support easier maintenance and sharing of metadata.
3. Basic Principles of Descriptive Elements
Each element is optional and repeatable. Metadata elements may appear in any order. The ordering of multiple occurrences of the same element (e.g., Creator) may have a significance intended by the provider, but ordering is not guaranteed to be preserved in every user environment. For instance, RDF/XML supports ordering, but HTML does not.
3.2. Element Content and Controlled Vocabularies
Content data for some elements may be selected from a "controlled vocabulary," which is a limited set of consistently used and carefully defined terms. This can dramatically improve search results because computers are good at matching words character by character but weak at understanding the way people refer to one concept using different words, i.e. synonyms. Without basic terminology control, inconsistent or incorrect metadata can profoundly degrade the quality of search results. For example, without a controlled vocabulary, "candy" and "sweet" might be used to refer to the same concept. Controlled vocabularies may also reduce the likelihood of spelling errors when recording metadata.
One cost of a controlled vocabulary is in needing an administrative body to review, update and disseminate the vocabulary. For example, the US Library of Congress Subject Headings (LCSH) and the US National Library of Medicine Medical Subject Headings (MeSH) are formal vocabularies, indispensable for searching rigorously cataloged collections. However, both require significant support organizations. Another cost is having to train searchers and creators of metadata so that they know when using MeSH, for example, to enter "myocardial infarction"' instead of the more colloquial "heart attack."
Using controlled vocabularies can be done most effectively using qualifiers.
4. The Core Elements
This section lists each Core element by its full name and label. For each element there is a reference description ( DCMES 1.1) and there are guidelines to assist in creating metadata content, whether it is done "from scratch" or by converting an existing record in another format. Links to examples and to recommended Dublin Core™ Qualifiers for each element are also provided.
The elements are listed in the order they were developed, but there are other useful ways to group them. In the following table, you can see that some elements relate to the content of the item, some to the item as intellectual property, still others to the particular instantiation, or version, of the item.
In July of 2000, the Dublin Core™ Metadata Initiative issued its list of recommended Dublin Core™ Qualifiers. At the time of the ratification of these qualifiers, the DCMI recognized two broad classes of qualifiers:
- Element Refinement. These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope. A client that does not understand a specific element refinement term should be able to ignore the qualifier and treat the metadata value as if it were an unqualified (broader) element. The definitions of element refinement terms for qualifiers must be publicly available.
- Encoding Scheme. These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent, the value may still be useful to a human reader. The definitive description of an encoding scheme for qualifiers must be clearly identified and available for public use.
5.2 The Dumb-Down Principle
The use of qualifiers as an additional level of detail introduces the situation where a client can encounter collections of resources that are described using Dublin Core™ with qualifiers that are unknown to the client application. This can happen either because the client does not support qualifiers and the collection does, or the collection supports specialized qualifiers developed by implementors for specific local or domain needs.
The useful interpretation of such descriptions will depend on the ability to ignore the unknown qualifiers and fall back on the broader meaning of the element in its unqualified form. The guiding principle for the qualification of Dublin Core™ elements, also known as the "Dumb-Down Principle," is that a client should be able to ignore any refinement and use the description as if it were unqualified. While this may result in some loss of specific meaning, the remaining element value (minus the qualifier) must continue to be generally correct.