Using Dublin Core
Using Dublin Core
Diane I. Hillmann
Project Manager & Metadata Specialist
National Science Digital Library Project at Cornell
Department of Computer Science
Ithaca, New York, USA
Is Replaced By:
Status of Document:
This is a DCMI Working Draft.
|Description of Document:||This document is intended as an entry point for users of Dublin Core™. For non-specialists, it will assist them in creating simple descriptive records for information resources (for example, electronic documents). Specialists may find the document a useful point of reference to the documentation of Dublin Core, as it changes and grows.|
TABLE OF CONTENTS
- 1.1. What is Metadata?
- 1.2. What is the Dublin Core?
- 1.3. The Purpose and Scope of This Guide
2. Which Syntax?
- 2.1. HTML
- 2.1.1. Using HTML Syntax
- 2.2. RDF/XML
- 2.3. Stand-Alone Metadata
- 2.4. Metadata Contained in a Resource
3. Basic Principles of Descriptive Elements
- 3.1. Element Parts and Syntax
- 3.2. Element Content and Controlled Vocabularies
4. The Core Elements
- 6.1 Generic Examples
- 6.2 Simple HTML Examples
- 6.3 Qualified HTML Examples
- 6.4 Simple RDF Examples
- 6.5 Qualified RDF Examples
- 6.6 Examples from other sources
Metadata describes an information resource. The term "meta" comes from a Greek word that denotes something of a higher or more fundamental nature. Metadata, then, is data about other data. It is the Internet-age term for information that librarians traditionally have put into catalogs, and it most commonly refers to descriptive information about Web resources. However, metadata can serve a variety of purposes, from identifying a resource that meets a particular information need, to evaluating their suitability for use, to tracking the characteristics of resources for maintenance or usage over time. Different communities of users meet such needs today with a wide variety of metadata standards.
A metadata record consists of a set of attributes, or elements, necessary to describe the resource in question. For example, a metadata system common in libraries -- the library catalog -- contains a set of metadata records with elements that describe a book or other library item: author, title, date of creation or publication, subject coverage, and the call number specifying location of the item on the shelf.
The linkage between a metadata record and the resource it describes may take one of two forms:
- elements may be contained in a record separate from the item, as in the case of the library's catalog record; or
- the metadata may be embedded in the resource itself.
Examples of embedded metadata that is carried along with the resource itself include the Cataloging In Publication (CIP) data printed on the verso of a book's title page; or the TEI header in an electronic text. Many metadata standards in use today, including the Dublin Core™ standard, do not prescribe either type of linkage, leaving the decision to each particular implementation.
Although the concept of metadata predates the Internet and the Web, worldwide interest in metadata standards and practices has exploded with the increase in electronic publishing and digital libraries, and the concomitant "information overload" resulting from vast quantities of undifferentiated digital data available online. Anyone who has attempted to find information online using one of today's popular Web search services has likely experienced the frustration of retrieving hundreds, if not thousands, of "hits" with limited ability to refine or make a more precise search. The wide scale adoption of descriptive standards and practices for electronic resources will improve retrieval of relevant resources from the "Internet commons." As noted by Weibel and Lagoze, two leaders in the field of metadata development:
The association of standardized descriptive metadata with networked objects has the potential for substantially improving resource discovery capabilities by enabling field-based (e.g., author, title) searches, permitting indexing of non-textual objects, and allowing access to the surrogate content that is distinct from access to the content of the resource itself." (Weibel and Lagoze, 1997)
It is this need for "standardized descriptive metadata" that the Dublin Core™ addresses.
The Dublin Core™ metadata standard is a simple yet effective element set for describing a wide range of networked resources. The Dublin Core™ standard comprises fifteen elements, the semantics of which have been established through consensus by an international, cross-disciplinary group of professionals from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship.
The Dublin Core™ element set is outlined in Section 4. Each element is optional and may be repeated. Each element also has a limited set of qualifiers, attributes that may be used to further refine (not extend) the meaning of the element. The Dublin Core™ Metadata Initiative (DCMI) has defined standard ways to "qualify" elements with various types of qualifiers. A registry of qualifiers conforming to DCMI "best practice" is in progress.
Although the Dublin Core™ favors document-like objects (because traditional text resources are fairly well understood), it can be applied to other resources as well. Its suitability for use with particular non-document resources will depend to some extent on how closely their metadata resembles typical document metadata and also what purpose the metadata is intended to serve.
Dublin Core™ has as its goals the following characteristics:
Simplicity of creation and maintenance
The Dublin Core™ element set has been kept as small and simple as possible to allow a non-specialist to create simple descriptive records for information resources easily and inexpensively, while providing for effective retrieval of those resources in the networked environment.
Commonly understood semantics
Discovery of information across the vast commons of the Internet is hindered by differences in terminology and descriptive practices from one field of knowledge to the next. The Dublin Core™ can help the 'digital tourist' -- a non-specialist searcher -- find his or her way by supporting a common set of elements, the semantics of which are universally understood and supported. For example, scientists concerned with locating articles by a particular author, and art scholars interested in works by a particular artist, can agree on the importance of a "creator" element. Such convergence on a common, if slightly more generic, element set increases the visibility and accessibility of all resources, both within a given discipline and beyond.
The Dublin Core™ Element Set was originally developed in English, but versions are being created in many other languages. As of November 1999, there were versions in over 20 languages, including Finnish, Norwegian, Thai, Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish. The Working Group on Dublin Core™ in Multiple Languages is coordinating efforts to link these versions in a distributed registry using the Resource Description Framework technology being developed by the World Wide Web Consortium ( W3C).
Although the technical challenges of internationalization on the World Wide Web have not been directly addressed by the Dublin Core™ development community, the involvement of representatives from almost every continent has ensured that the development of the standard considers the multilingual and multicultural nature of the electronic information universe.
While balancing the needs for simplicity in describing digital resources with the need for precise retrieval, Dublin Core™ developers have recognized the importance of providing a mechanism for extending the DC element set for additional resource discovery needs. It is expected that other communities of metadata experts will create and administer additional metadata sets. Metadata elements from these sets could be linked with Dublin Core™ metadata to meet the need for extensibility. This model allows different communities to use the DC elements for core descriptive information which will be usable across the Internet, while allowing domain specific additions which make sense within a more limited arena.
This document is intended to an entry point for users of Dublin Core™. For non-specialists, it will assist them in creating simple descriptive records for information resources (for example, electronic documents). Specialists may find the document a useful point of reference to the documentation of Dublin Core, as it changes and grows.
The guide will show in a non-technical fashion how Dublin Core™ metadata may be used by anyone to make their material more accessible. This guide discusses the layout and content of Dublin Core™ metadata elements, how to use them in composing a complete Dublin Core™ metadata record, as well as how to qualify elements to support use by a wide variety of communities.
Another important goal of this document is to promote "best practices" for describing resources using the Dublin Core™ element set. The Dublin Core™ community recognizes that consistency in creating metadata is an important key to achieving complete retrieval and intelligible display across disparate sources of descriptive records. Inconsistent metadata effectively hides desired records, resulting in uneven, unpredictable or incomplete search results.
In this guide, we have chosen to represent Dublin Core™ examples in several different syntaxes, including: HTML, the Web's Hypertext Markup Language format, RDF/XML (The Resource Description Framework using eXtensable Markup Language) and in a generic form (Element="value"). HTML provides an easily understood format for demonstrating Dublin Core's underlying concepts, but more complex applications using qualification may find that using RDF/XML makes more sense. When considering an appropriate syntax, it is important to note that Dublin Core™ concepts are equally applicable to virtually any file format, as long as the metadata is in a form suitable for interpretation both by search engines and by human beings.
HTML has two tags that can be used to capture metadata. These are the "" and "" tags. If creating metadata that will be embedded, or appear alongside, an actual document these tags must appear within the HEAD section of the HTML document. For example:
Northern Hairy Nosed Wombats
The Northern Hairy Nosed Wombat is an animal native to Australia....
Indexing programs understand that the metadata record starts after the "" line and ends before the "" line, and are thus able to extract metadata automatically. The metadata does not appear during normal document formatting or printing, and metadata-aware Web browsers may even be able to exploit it. A number of the current search engines have begun to include the ability to make use of the HTML tag in Web documents.
In HTML, each record element definition begins with "<META'' and ends with ">". Within the META tag, two attribute/value pairs (as found in other HTML tags) are used to define the metadata. The first is NAME, the second, CONTENT. These two work together to define the metadata within the META tag.
This document will not cover the use of the LINK tags.
Each descriptive element definition has a NAME attribute and a CONTENT attribute, as in:
Any metadata element may be omitted or repeated. When repeating elements, it is recommended best practice to list each element definition separately, as in:
However, it is also valid to express repeated elements using a single NAME attribute with multiple semi-colon delimited values for the CONTENT attribute, as in:
A Proposed Convention for Embedding Metadata in HTML agreed upon a convention for identifying and grouping metadata schemes in HTML. This convention relies on the use of a prefix to indicate that the elements used are from Dublin Core™ or another metadata scheme. For increased readability the prefix "DC" should be written in upper case letters and element names should be capitalized. For example:
DC.CREATOR or dc.CREATOR or DC.creator
If non-ASCII characters are required, use the same conventions as in the body of the document. For example:
[Text still needed here]
Below are some examples of how the META tag might be used in stand-alone and embedded metadata. Note that each metadata definition happens to fit on one line, but in general a definition can span several lines.
Stand-alone metadata can exist in any kind of database. This example describes a photograph in another file that has a location given by a Uniform Resource Locator (URL). The entire record file looks like this:
The next example is of a metadata record contained in a file alongside the document that it describes. The document is a short poem expressed in HTML, the Web's Hypertext Markup Language .
I think that I shall never see A billboard lovely as a tree. Indeed, unless the billboards fall I'll never see a tree at all.
Each element is optional and repeatable. Metadata elements may appear in any order. The ordering of multiple occurrences of the same element (e.g., Creator) may have a significance intended by the provider, but ordering is not guaranteed to be preserved in every user environment. For instance, RDF supports ordering, but HTML does not.
Content data for some elements may be selected from a "controlled vocabulary," which is a limited set of consistently used and carefully defined terms. This can dramatically improve search results because computers are good at matching words character by character but weak at understanding the way people refer to one concept using different words, i.e. synonyms. Without basic terminology control, inconsistent or incorrect metadata can profoundly degrade the quality of search results. For example, without a controlled vocabulary, "candy" and "sweet" might be used to refer to the same concept. Controlled vocabularies may also reduce the likelihood of spelling errors when recording metadata.
One cost of a controlled vocabulary is in needing an administrative body to review, update and disseminate the vocabulary. For example, the US Library of Congress Subject Headings (LCSH) and the US National Library of Medicine Medical Subject Headings (MeSH) are formal vocabularies, indispensable for searching rigorously cataloged collections. However, both require significant support organizations. Another cost is having to train searchers and creators of metadata so that they know when using MeSH, for example, to enter "myocardial infarction"' instead of the more colloquial "heart attack."
Using controlled vocabularies can be done most effectively using qualifiers.
This section lists each Core element by its full name and label. For each element there is a reference description (taken from the RFC) and there are guidelines to assist in creating metadata content, whether it is done "from scratch" or by converting an existing record in another format. Links to examples and to recommended Dublin Core™ Qualifiers for each element are also provided.
The elements are listed in the order they were developed, but there are other useful ways to group them. In the following table, you can see that some elements relate to the content of the item, some to the item as intellectual property, still others to the particular instantiation, or version, of the item.
In July of 2000, the Dublin Core™ Metadata Initiative issued its list of recommended Dublin Core™ Qualifiers. At the time of the ratification of these qualifiers, the DCMI recognized two broad classes of qualifiers:
- Element Refinement. These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope. A client that does not understand a specific element refinement term should be able to ignore the qualifier and treat the metadata value as if it were an unqualified (broader) element. The definitions of element refinement terms for qualifiers must be publicly available.
- Encoding Scheme. These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent, the value may still be useful to a human reader. The definitive description of an encoding scheme for qualifiers must be clearly identified and available for public use.