|
Title:
|
Using Dublin Core
|
|
Creator:
|
|
|
Date Issued:
|
2000-07-16
|
|
Identifier:
|
|
|
Replaces:
|
|
|
Is Replaced By:
|
Not applicable
|
|
Latest Version:
|
|
|
Status of Document:
|
|
| Description of Document: |
This document is intended as an
entry point for users of Dublin Core. For non-specialists,
it will assist them in creating simple descriptive records
for information resources (for example, electronic documents).
Specialists may find the document a useful point of reference
to the documentation of Dublin Core, as it changes and grows. |
| Document Metadata: |
http://purl.org/dc/documents/wd/usageguide-20000716.htm.rdf
|
|
This document is also available in French.
TABLE OF CONTENTS
6. Examples
1. INTRODUCTION
1.1. What is Metadata?
Metadata describes an information resource. The term "meta"
comes from a Greek word that denotes something of a higher or
more fundamental nature. Metadata, then, is data about other data.
It is the Internet-age term for information that librarians traditionally
have put into catalogs, and it most commonly refers to descriptive
information about Web resources. However, metadata can serve a
variety of purposes, from identifying a resource that meets a
particular information need, to evaluating their suitability for
use, to tracking the characteristics of resources for maintenance
or usage over time. Different communities of users meet such needs
today with a wide variety of metadata standards.
A metadata record consists of a set of attributes, or elements,
necessary to describe the resource in question. For example, a
metadata system common in libraries -- the library catalog --
contains a set of metadata records with elements that describe
a book or other library item: author, title, date of creation
or publication, subject coverage, and the call number specifying
location of the item on the shelf.
The linkage between a metadata record and the resource it describes
may take one of two forms:
- elements may be contained in a record separate from the item,
as in the case of the library's catalog record; or
- the metadata may be embedded in the resource itself.
Examples of embedded metadata that is carried along with the
resource itself include the Cataloging In Publication (CIP) data
printed on the verso of a book's title page; or the TEI header
in an electronic text. Many metadata standards in use today, including
the Dublin Core standard, do not prescribe either type of linkage,
leaving the decision to each particular implementation.
Although the concept of metadata predates the Internet and the
Web, worldwide interest in metadata standards and practices has
exploded with the increase in electronic publishing and digital
libraries, and the concomitant "information overload"
resulting from vast quantities of undifferentiated digital data
available online. Anyone who has attempted to find information
online using one of today's popular Web search services has likely
experienced the frustration of retrieving hundreds, if not thousands,
of "hits" with limited ability to refine or make a more
precise search. The wide scale adoption of descriptive standards
and practices for electronic resources will improve retrieval
of relevant resources from the "Internet commons." As
noted by Weibel and Lagoze, two leaders in the field of metadata
development:
The association of standardized descriptive metadata with
networked objects has the potential for substantially improving
resource discovery capabilities by enabling field-based (e.g.,
author, title) searches, permitting indexing of non-textual
objects, and allowing access to the surrogate content that is
distinct from access to the content of the resource itself."
(Weibel and Lagoze, 1997)
It is this need for "standardized descriptive metadata"
that the Dublin Core addresses.
1.2. What is the Dublin Core?
The Dublin Core metadata standard is a simple yet effective
element set for describing a wide range of networked resources.
The Dublin Core standard comprises fifteen elements, the semantics
of which have been established through consensus by an international,
cross-disciplinary group of professionals from librarianship,
computer science, text encoding, the museum community, and other
related fields of scholarship.
The Dublin Core element set is outlined in Section
4. Each element is optional and may be repeated. Each element
also has a limited set of qualifiers, attributes that may be used
to further refine (not extend) the meaning of the element. The
Dublin Core Metadata Initiative (DCMI) has defined standard ways
to "qualify" elements with various types of qualifiers.
A registry of qualifiers conforming to DCMI "best practice"
is in progress.
Although the Dublin Core favors document-like objects (because
traditional text resources are fairly well understood), it can
be applied to other resources as well. Its suitability for use
with particular non-document resources will depend to some extent
on how closely their metadata resembles typical document metadata
and also what purpose the metadata is intended to serve.
Dublin Core has as its goals the following characteristics:
Simplicity of creation and maintenance
The Dublin Core element set has been kept as small and simple
as possible to allow a non-specialist to create simple descriptive
records for information resources easily and inexpensively,
while providing for effective retrieval of those resources in
the networked environment.
Commonly understood semantics
Discovery of information across the vast commons of the Internet
is hindered by differences in terminology and descriptive practices
from one field of knowledge to the next. The Dublin Core can
help the 'digital tourist' -- a non-specialist searcher -- find
his or her way by supporting a common set of elements, the semantics
of which are universally understood and supported. For example,
scientists concerned with locating articles by a particular
author, and art scholars interested in works by a particular
artist, can agree on the importance of a "creator"
element. Such convergence on a common, if slightly more generic,
element set increases the visibility and accessibility of all
resources, both within a given discipline and beyond.
International scope
The Dublin Core Element Set was originally developed in English,
but versions are being created in many other languages. As of
November 1999, there were versions in over 20 languages, including
Finnish, Norwegian, Thai, Japanese, French, Portuguese, German,
Greek, Indonesian, and Spanish. The Working Group on Dublin
Core in Multiple Languages is coordinating efforts to link these
versions in a distributed registry using the Resource
Description Framework technology being developed by the
World Wide Web Consortium (W3C).
Although the technical challenges of internationalization on
the World Wide Web have not been directly addressed by the Dublin
Core development community, the involvement of representatives
from almost every continent has ensured that the development
of the standard considers the multilingual and multicultural
nature of the electronic information universe.
Extensibility
While balancing the needs for simplicity in describing digital
resources with the need for precise retrieval, Dublin Core developers
have recognized the importance of providing a mechanism for
extending the DC element set for additional resource discovery
needs. It is expected that other communities of metadata experts
will create and administer additional metadata sets. Metadata
elements from these sets could be linked with Dublin Core metadata
to meet the need for extensibility. This model allows different
communities to use the DC elements for core descriptive information
which will be usable across the Internet, while allowing domain
specific additions which make sense within a more limited arena.
1.3. The Purpose and Scope of This Guide
This document is intended to an entry point for users of Dublin
Core. For non-specialists, it will assist them in creating simple
descriptive records for information resources (for example, electronic
documents). Specialists may find the document a useful point of
reference to the documentation of Dublin Core, as it changes and
grows.
The guide will show in a non-technical fashion how Dublin Core
metadata may be used by anyone to make their material more accessible.
This guide discusses the layout and content of Dublin Core metadata
elements, how to use them in composing a complete Dublin Core
metadata record, as well as how to qualify elements to support
use by a wide variety of communities.
Another important goal of this document is to promote "best
practices" for describing resources using the Dublin Core
element set. The Dublin Core community recognizes that consistency
in creating metadata is an important key to achieving complete
retrieval and intelligible display across disparate sources of
descriptive records. Inconsistent metadata effectively hides desired
records, resulting in uneven, unpredictable or incomplete search
results.
2. Which Syntax?
In this guide, we have chosen to represent Dublin Core examples
in several different syntaxes, including: HTML, the Web's Hypertext
Markup Language format, RDF/XML (The Resource Description Framework
using eXtensable Markup Language) and in a generic form (Element="value").
HTML provides an easily understood format for demonstrating Dublin
Core's underlying concepts, but more complex applications using
qualification may find that using RDF/XML makes more sense. When
considering an appropriate syntax, it is important to note that
Dublin Core concepts are equally applicable to virtually any file
format, as long as the metadata is in a form suitable for interpretation
both by search engines and by human beings.
2.1. HTML
HTML has two tags that can be used to capture metadata. These
are the "<META>" and "<LINK>"
tags. If creating metadata that will be embedded, or appear alongside,
an actual document these tags must appear within the HEAD section
of the HTML document. For example:
<HTML>
<HEAD>
<TITLE>Mating Habits of the Northern Hairy Nosed Wombat</TITLE>
<META NAME= "DC.Creator" CONTENT="Smythe, Pearl">
</HEAD>
<BODY>
<H1>Northern Hairy Nosed Wombats</H1>
<P>The Northern Hairy Nosed Wombat is an animal native to
Australia....</P>
</BODY>
</HTML>
Indexing programs understand that the metadata record starts after
the "<HEAD>" line and ends before the "</HEAD>"
line, and are thus able to extract metadata automatically. The
metadata does not appear during normal document formatting or
printing, and metadata-aware Web browsers may even be able to
exploit it. A number of the current search engines have begun
to include the ability to make use of the HTML <META> tag
in Web documents.
In HTML, each record element definition begins with "<META''
and ends with ">". Within the META tag, two attribute/value
pairs (as found in other HTML tags) are used to define the metadata.
The first is NAME, the second, CONTENT. These two work together
to define the metadata within the META tag.
This document will not cover the use of the LINK tags.
2.1.1. Using HTML Syntax
Each descriptive element definition has a NAME attribute and
a CONTENT attribute, as in:
<META NAME="DC.Creator" CONTENT="Browning,
Elizabeth">
Any metadata element may be omitted or repeated. When repeating
elements, it is recommended best practice to list each element
definition separately, as in:
<META NAME="DC.Creator" CONTENT="Marx, Karl">
<META NAME="DC.Creator" CONTENT="Engels, Friedrich">
However, it is also valid to express repeated elements using
a single NAME attribute with multiple semi-colon delimited values
for the CONTENT attribute, as in:
<META NAME="DC.Creator" CONTENT="Marx, Karl
; Engels, Friedrich">
A Proposed Convention for Embedding Metadata in HTML agreed upon
a convention for identifying and grouping metadata schemes in
HTML. This convention relies on the use of a prefix to indicate
that the elements used are from Dublin Core or another metadata
scheme. For increased readability the prefix "DC" should
be written in upper case letters and element names should be capitalized.
For example:
META NAME="DC.Title"
META NAME="DC.Creator"
NOT
DC.CREATOR or dc.CREATOR or DC.creator
If non-ASCII characters are required, use the same conventions
as in the body of the document. For example:
<META NAME="DC.Title" CONTENT="Les biscuits
à la banane">
2.2. RDF/XML
[Text still needed here]
Below are some examples of how the META tag might be used in
stand-alone and embedded metadata. Note that each metadata definition
happens to fit on one line, but in general a definition can span
several lines.
2.3. Stand-Alone Metadata
Stand-alone metadata can exist in any kind of database. This
example describes a photograph in another file that has a location
given by a Uniform Resource Locator (URL). The entire record file
looks like this:
<META NAME="DC.Title" CONTENT="Kita Yama (Japan)">
<META NAME="DC.Creator" CONTENT="Kertesz, Andre">
<META NAME="DC.Date" CONTENT="1968">
<META NAME="DC.Type" CONTENT="image">
<META NAME="DC.Format" CONTENT="image/gif">
<META NAME="DC.Identifier" CONTENT="http://foo.bar.zaf/kertesz/kyama">
2.4. Metadata Contained in a Resource
The next example is of a metadata record contained in a file
alongside the document that it describes. The document is a short
poem expressed in HTML, the Web's Hypertext Markup Language [3].
<HTML>
<HEAD>
<TITLE>Song of the Open Road</TITLE>
<META NAME="DC.Title" CONTENT="Song of the Open
Road">
<META NAME="DC.Creator" CONTENT="Nash, Ogden">
<META NAME="DC.Type" CONTENT="text">
<META NAME="DC.Date" CONTENT="1939">
<META NAME="DC.Format" CONTENT="text/html">
<META NAME="DC.Identifier" CONTENT="http://www.poetry.com/nash/open.html">
</HEAD>
<BODY><PRE>
I think that I shall never see
A billboard lovely as a tree.
Indeed, unless the billboards fall
I'll never see a tree at all.
</PRE></BODY>
</HTML>
3. Basic Principles of Descriptive Elements
Each element is optional and repeatable. Metadata elements may
appear in any order. The ordering of multiple occurrences of the
same element (e.g., Creator) may have a significance intended
by the provider, but ordering is not guaranteed to be preserved
in every user environment. For instance, RDF supports ordering,
but HTML does not.
3.2. Element Content and Controlled Vocabularies
Content data for some elements may be selected from a "controlled
vocabulary," which is a limited set of consistently used
and carefully defined terms. This can dramatically improve search
results because computers are good at matching words character
by character but weak at understanding the way people refer to
one concept using different words, i.e. synonyms. Without basic
terminology control, inconsistent or incorrect metadata can profoundly
degrade the quality of search results. For example, without a
controlled vocabulary, "candy" and "sweet"
might be used to refer to the same concept. Controlled vocabularies
may also reduce the likelihood of spelling errors when recording
metadata.
One cost of a controlled vocabulary is in needing an administrative
body to review, update and disseminate the vocabulary. For example,
the US Library of Congress Subject Headings (LCSH) and the US
National Library of Medicine Medical Subject Headings (MeSH) are
formal vocabularies, indispensable for searching rigorously cataloged
collections. However, both require significant support organizations.
Another cost is having to train searchers and creators of metadata
so that they know when using MeSH, for example, to enter "myocardial
infarction"' instead of the more colloquial "heart attack."
Using controlled vocabularies can be done most effectively using
qualifiers.
4. The Core Elements
This section lists each Core element by its full name and label.
For each element there is a reference description (taken from
the RFC) and there are guidelines to assist in creating metadata
content, whether it is done "from scratch" or by converting
an existing record in another format. Links to examples and to
recommended Dublin Core Qualifiers for each element are also provided.
The elements are listed in the order they were developed, but
there are other useful ways to group them. In the following table,
you can see that some elements relate to the content of the item,
some to the item as intellectual property, still others to the
particular instantiation, or version, of the item.
5. Qualifiers
In July of 2000, the Dublin Core Metadata Initiative issued its
list of recommended Dublin
Core Qualifiers. At the time of the ratification of these
qualifiers, the DCMI recognized two broad classes of qualifiers:
- Element Refinement. These qualifiers make the meaning
of an element narrower or more specific. A refined element shares
the meaning of the unqualified element, but with a more restricted
scope. A client that does not understand a specific element
refinement term should be able to ignore the qualifier and treat
the metadata value as if it were an unqualified (broader) element.
The definitions of element refinement terms for qualifiers must
be publicly available.
- Encoding Scheme. These qualifiers identify schemes
that aid in the interpretation of an element value. These schemes
include controlled vocabularies and formal notations or parsing
rules. A value expressed using an encoding scheme will thus
be a token selected from a controlled vocabulary (e.g., a term
from a classification system or set of subject headings) or
a string formatted in accordance with a formal notation (e.g.,
"2000-01-01" as the standard expression of a date).
If an encoding scheme is not understood by a client or agent,
the value may still be useful to a human reader. The definitive
description of an encoding scheme for qualifiers must be clearly
identified and available for public use.
|