Dublin Core (Registered Trademark) Metadata Initiative logo and catchphrase: 
Making it easier to find information
Jump to main content: This Page
Jump to site map: New Page
The Metadata Community — Supporting Innovation in Metadata Design, Implementation & Best Practices
Dublin Core (Registered Trademark) logo innovation banner Join us logo
 
 

 

Notes on DCMI specifications for Dublin Core metadata in HTML/XHTML meta and link elements

Creator: Pete Johnston
Eduserv Foundation, UK
Date Issued: 2008-08-04
Identifier: http://dublincore.org/documents/2008/08/04/dc-html-notes/
Replaces: Not applicable
Is Replaced By: Not applicable
Latest Version: http://dublincore.org/documents/dc-html-notes/
Status of Document: This is a DCMI Recommended Resource
Description of Document: This document serves as a guide to implementers to the differences between the new HTML/XHTML meta data profile provided by DCMI for the expression of DC metadata in HTML/XHTML meta and link elements and the previous profile.

Table of contents

  1. Introduction
  2. Background
  3. Comparison between the DC-HTML-2003 and DC-HTML-2008 HTML/XHTML meta data profiles
  4. Recommendations
  5. Recommendations

1. Introduction

In July 2008, DCMI published the document Expressing Dublin Core using HTML/XHTML meta and link elements [DC-HTML-2008] as a DCMI Recommendation. It supercedes the previous Recommendation Expressing Dublin Core in HTML/XHTML meta and link elements [DC-HTML-2003].

This document discusses the reasons for this change, the differences between the two documents and makes some recommendations for their use.

2. Background

One of the primary mechanisms for expressing metadata within an HTML/XHTML document is the use of the meta and link elements and their attributes within the HTML/XHTML head element. The conventions used for defining the values of the attributes of meta and link elements are described in what the HTML specification calls a "meta data profile" [HTML-PROFILE]. A "meta data profile" is identified by a URI and the use of a profile within an HTML document is disclosed in the value of the profile attribute of the head element, e.g.

<head profile="http://dublincore.org/documents/2008/08/04/dc-html/">

The presence of this URI in the profile attribute value indicates that the meta data profile should be applied in order to interpret the given HTML/XHTML instance. A single HTML/XHTML instance may declare the use of multiple meta data profiles, by providing a white-space separated list of URIs as the attribute value:

<head profile="http://dublincore.org/documents/2008/08/04/dc-html/ http://purl.org/NET/erdf/profile">

Both of the DCMI specifications for encoding DC metadata in HTML/XHTML referred to in the introduction to this document [DC-HTML-2003, DC-HTML-2008] are HTML meta data profiles, identified by the URIs http://dublincore.org/documents/dcq-html/, and http://dublincore.org/documents/2008/08/04/dc-html/ respectively.

For brevity, in the remainder of this document these profiles are referred to as "the DC-HTML-2003 profile" and "the DC-HTML-2008 profile" respectively.

2.1 The DCMI Abstract Model

Since 2003, DCMI has sought to formalise its model for Dublin Core metadata, and this has resulted in the publication of the DCMI Abstract Model [ABSTRACT-MODEL], the second version of which was given the status of DCMI Recommendation in June 2007.

The Abstract Model defines an abstract information structure called a description set. In order for applications to store or exchange DC metadata description sets, instances of those information structures must be represented in some concrete digital form according to the rules of a format or syntax. The DCMI Abstract Model itself does not define any such concrete formats or syntaxes for representing a DC metadata description set; DCMI defers that role to the family of specifications it refers to as "encoding guidelines".

Such a specification performs three functions:

The role of "encoding guidelines" and their relationship to the DCAM is illustrated graphically in the introduction to the tutorial on "Basic Syntax" presented at the DC-2007 conference [SYNTAXTUT].

2.2 Domains & Ranges for DCMI Properties

In January 2008, the DCMI Usage Board completed the process of making assertions about the domains and ranges associated with the DCMI-owned properties [REV-TERMS DOMAINS]. These assertions clarify the formal semantics of the DCMI properties by making available in a machine-processable form information that is implicit in the natural language definitions of the properties.

One of the consequences of the publication of these assertions for DCMI properties is that metadata creators should ensure that their use of any DCMI property is consistent with the specification ofrange and domain for that property provided by the DCMI Usage Board. In particular, for the purposes of this discussion, it should be noted that properties for which the range is specified to be the class of "literals" should be used with literal values, and properties for which the range is specified to be some class of "non-literal" resources should be used with appropriate non-literal values.

2.3 Expressing Dublin Core using RDF

Also in January 2008, DCMI published the document Expressing Dublin Core using the Resource Description Framework (RDF) [DC-RDF] as a DCMI Recommendation. This document described how the features of the DCMI Abstract Model description set model are represented using the RDF model, and replaced earlier DCMI specifications for expressing DC metadata in RDF.

2.4 Gleaning Resource Descriptions from Dialects of Languages (GRDDL)

Gleaning Resource Descriptions from Dialects of Languages (GRDDL) [GRDDL] is a W3C Recommendation which describes a set of conventions for associating an XML document with an algorithm for the extraction of a set of RDF triples from that document. One of the mechanisms defined by GRDDL is the association of what it calls a Profile Transformation with an XHTML meta data profile, so that the transformation can be applied to extract RDF triples from any document which references that profile (using the profile attribute as described above.

2.5 Interoperability Levels for Dublin Core Metadata

The DCMI Architecture Community is currently developing a draft document titled Interoperability levels for Dublin Core metadata [DC-LEVELS].

It describes several different categories or "levels" of interoperability that may be enabled using DC metadata, and specifies for each level the requirements that should be met by a metadata provider (and the expectations that a metadata consumer can expect to be satisfied).

2.6 The DC-HTML-2008 meta data profile

The DC-HTML-2008 profile emerges from, and is directly shaped by, several of the developments listed above. The primary purpose of the DC-HTML-2008 profile is to enable what that document calls "DCAM-based syntactic interoperability" ("Level 3" interoperabilty), for which a pre-requisite is to support "Semantic interoperability" ("Level 2" interoperabilty), based on the RDF model. The profile provides both a mapping to a DC description set and to an RDF Graph, using the conventions specified in the DCMI Recommendation for representing DC metadata in RDF [DC-RDF]. Further, it provides an algorithm which implements this mapping to an RDF Graph in the form of a GRDDL Profile Transformation.

3. Comparison between the DC-HTML-2003 and DC-HTML-2008 HTML/XHTML meta data profiles

The DC-HTML-2003 profile and the DC-HTML-2008 profile are two different HTML meta data profiles.

The DC-HTML-2008 profile is specified in terms of the DCAM description set model and all features of the profile have a well-defined mapping to the constructs of an RDF Graph and of a DCAM description set.

The DC-HTML-2003 profile was not defined in terms of the DCAM description set model, which in 2003 did not exist in today's form, or of an RDF Graph. Although retrospective mappings to the constructs of an RDF Graph and of a DCAM description set can be constructed, such mappings can be made only for some features of the profile. (For a full explanation of how the DCAM interpretation of the DC-HTML-2003 profile is constructed, see Appendix A.)

The features of the DCAM description set model supported by the two meta data profiles are summarised in the following table:

DCAM Description Set Model feature Supported in DCAM Description Set Model Supported in DC-HTML-2003 Supported in DC-HTML-2008
description set One description set One description set One description set
description One to many descriptions One description One description
described resource URI One per description; any URI One per description; Document URI/Base URI One per description; Document URI/Base URI
statement One-to-many statements per description One-to-many statements per description One-to-many statements per description
property URI One per statement; any URI One per statement; any URI One per statement; any URI
literal value surrogate One per statement One per statement; partial support One per statement; partial support
literal value surrogate / value string One per literal value surrogate One per literal value surrogate; partial support One per literal value surrogate; partial support
literal value surrogate / value string language Zero-to-one per value string Zero-to-one per value string Zero-to-one per value string
literal value surrogate / SES URI Zero-to-one per value string Not supported Zero-to-one per value string; XML Literal datatype not supported
non-literal value surrogate One per statement One per statement; partial support One per statement; partial support
non-literal value surrogate / value string Zero-to-many per non-literal value surrogate Not supported Zero-to-one per non-literal value surrogate
non-literal value surrogate / value string language Zero-to-one per value string Not supported Zero-to-one per value string
non-literal value surrogate / SES URI Zero-to-one per value string Not supported Not supported
non-literal value surrogate / value URI Zero-to-one per non-literal value surrogate One per non-literal value surrogate One per non-literal value surrogate
non-literal value surrogate / VES URI Zero-to-one per non-literal value surrogate Not supported Not supported

In terms of the features of the DCAM description set model supported, the differences between them are:

Note that neither the DC-HTML-2003 profile nor the DC-HTML-2008 profile supports the encoding of vocabulary encoding scheme URIs, nor the use of XML Literals as value strings.

There are also differences in the syntactic features themselves:

In any HTML/XHTML document, the value of the profile attribute of the head element specifies which meta data profiles are used in that document. A document with a profile value of http://dublincore.org/documents/dcq-html/ is intended to be interpreted using the DC-HTML-2003 profile; and a document with a profile value of http://dublincore.org/documents/2008/08/04/dc-html/ is intended to be interpreted using the DC-HTML-2008 profile. Note that the presence of the URI of a profile licenses the interpretation of the document in accordance with the rules of that profile. This is reflected in the fact that it is through the use of the profile URI that a GRDDL processor obtains access to a transformation algorithm which is specific to a profile.

It is important to note that some of the conventions used in the DC-HTML-2003 profile will generate quite different sets of statements when interpreted using the DC-HTML-2008 profile. There are two areas where this is the case:

3.1 Prefixed Names

Consider the following example:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head profile="xxx yyy">

<title>My Document</title>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" >
<meta name="DC.date.modified" content="2007-07-22" >
</head>
<body>
</body>
</html>

According to the DC-XHTML-2003 profile, this should be interpreted as encoding a single statement with a "composite prefixed name" used as an abbreviation for the property URI http://purl.org/dc/terms/modified.

Interpreted acording to the DC-XHTML-2008 profile, it generates a single statement with a property URI http://purl.org/dc/elements/1.1/date.modified, which is not the URI of a DCMI-owned property.

3.2 The meta/@scheme attribute

The DC-HTML-2003 profile does not make a distinction between vocabulary encoding scheme URIs and syntax encoding scheme URIs and for this reason it is not possible to specify an unambiguous interpretation of the use of the scheme attribute of the meta element in terms of an RDF Graph or DC description set.

The DC-HTML-2008 profile specifies that the scheme attribute of the meta element is to be interpreted as providing the syntax encoding scheme URI for a literal value surrogate (in terms of the RDF Graph, the datatype of a literal object).

3.3 Choosing a meta data profile

The choice of profile depends on the requirements of the application: as the table above indicates, the DC-HTML-2008 profile supports some features of the DCAM description set model which are not supported by the DC-HTML-2003 profile (syntax encoding scheme URIs for literal value surrogates and value strings for non-literal value surrogates). It also provides a simpler and more consistent mechanism for the encoding of property URIs.

The use of the profile attribute ensures that there is no question of ambiguity over how the provider of any document intends that it should be processed.

If both DCMI profile URIs are present, then a processor may apply both interpretations. However, metadata providers should use this combination with caution. Bearing in mind the differences in interpretation noted above, if a document signals the use of both profiles, or if the value of the profile attribute is simply changed from http://dublincore.org/documents/dcq-html/ to http://dublincore.org/documents/2008/08/04/dc-html/ without changing the content of the data, then, depending on the content of the data and the features of the profiles used, there is a risk that different interpretations of the data will result.

If neither DCMI profile URI is present, then no interpretation is licensed by DCMI specifications. An application may apply an interpretation of such a document as a DC description set, either as the result of the use of another profile defined by an agency other than DCMI, or as the result of some other agreement between provider and consumer.

4. Recommendations

4.1 For providers of DC metadata

New providers of DC metadata encoded in the header of an HTML/XHTML document SHOULD make use of the DC-HTML-2008 profile.

A provider of DC metadata encoded in the header of an HTML/XHTML document:

To enable a mapping to an RDF Graph (level 2 interoperability) or to a DC description set (level 3 interoperability), a provider of DC metadata using the DC-HTML-2003 profile:

To enable a mapping to an RDF Graph (level 2 interoperability) or to a DC description set (level 3 interoperability), a provider of DC metadata using the DC-HTML-2008 profile:

A provider of DC metadata migrating from the use of the DC-HTML-2003 profile to the use of the DC-HTML-2008 profile:

4.2 For consumers of DC metadata

A consumer of DC metadata encoded in the header of an HTML/XHTML document:

A consumer of data using the DC-HTML-2003 profile and making a mapping to an RDF Graph (level 2 interoperability) or to a DC description set (level 3 interoperability):

A consumer of data using the DC-HTML-2008 profile and making a mapping to an RDF Graph (level 2 interoperability) or to a DC description set (level 3 interoperability):

Appendix A: DC-HTML-2003 and the DCAM

A.1 Expressing Dublin Core in HTML/XHTML meta and link elements (2003)

The DCMI Recommendation, Expressing Dublin Core in HTML/XHTML meta and link elements [DC-HTML-2003] pre-dates the development of the DCAM, so it does not perform the functions described in the introduction to this document: it does not describe either how components of (a subset of) the DCAM description set model are to be "encoded", or how features of the format are to be interpreted as representing a DC metadata description set.

However, DC-HTML-2003 does broadly follow the general approach described above, of making a distinction between an information structure (which it calls a "DC record") and the way that record is represented. Essentially, it defines its own "description model", based on the concept of the "DC record", and describes how instances of that information structure are to be represented in HTML/XHTML documents. The DC-HTML-2003 concept of the "DC record" is not based on the DCAM description set model, and indeed it uses some of the same terminology used in the DCAM, but with different meanings.

So any attempt to provide an interpretation of the DC-HTML-2003 recommendation in terms of the DCAM description is - must be - a retrospective exercise. It depends on a two stage process:

If the first step reveals that some components of a "DC record" can not be mapped to components of the DCAM description set, then there will be aspects of the syntax which, while they do have an interpretation as representing components of a "DC record", do not have an interpretation as representing components of the DCAM description set. And similarly, the first step may show that there are constructs and components of the DCAM description set which have no correspondence in the "DC record", in which case there will be no syntactic representation of those constructs and components in the current (DC-HTML-2003) meta data profile.

A.2 Mapping the "DC record" to the description set

Two approaches might be taken to constructing such a mapping

The first thing to note is that the concept of the "DC record" in the DC-HTML-2003 document is rather underspecified. The introduction refers to a "record" as

some structured metadata about a resource, comprising one or more properties and their associated values.

In the context of DC-HTML-2003, the term "value" is used to refer to a literal. However the document goes on to discuss concepts such as "element", "element refinement", "encoding scheme" and "language", and how instances of these concepts should be represented using the HTML/XHTML profile without explaining the relationship of these concepts to that of the "record". For the purpose of this discussion, we assume that (using these terms as they are used in DC-HTML-2003, not as they are used in the DCAM):

Such an interpretation seems consistent with the use of those terms in the DCMI Recommendation Guidelines for implementing Dublin Core in XML [DC-XML-2003], which provides more explicit "abstract models" for the data being represented.

A.3 The "informal" approach

The following table is an attempt to specify a mapping between the "DC record" described by DC-HTML-2003 and the description set described by the DCAM, such that the assertions made by the description set correspond to - or at least do not contradict - the assertions made by the "DC record".

DC-HTML-2003 model DCAM description set model
"DC record" description set containing a single description
"Property + Value" statement
"URI of Property" property URI
"Value" literal value surrogate/value string or non-literal value surrogate/value URI
"Language" value string language

There are several points worth noting:

Using this mapping in conjunction with the DC-HTML-2003 profile, the following DCAM interpretation for DC-HTML-2003 might be inferred.

An X/HTML document using the DC-HTML-2003 profile encodes a description set containing

A.4 dc-extract.xsl

Dan Connolly of W3C produced an XSLT stylesheet which generates an RDF/XML representation of the encoded metadata from an XHTML document using the DC-HTML-2003 profile. In terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions:

If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.

An X/HTML document using the DC-HTML-2003 profile encodes a description set containing

A.5 Embedded RDF

Embedded RDF [ERDF], designed by Ian Davis (Talis), is a set of conventions for embeddimg RDF triples into HTML/XHTML. There is no formal association between Embedded RDF and the DC-HTML-2003 profile, but the documentation for Embedded RDF notes that it was designed to be compatible with the DC-HTML-2003 profile, so an Embedded RDF interpretation can be made for an instance of the DC-HTML-2003 profile. Again, in the terms of the Interoperability Levels document, it supports "Level 2" "semantic interoperability" for the DC-HTML-2003 profile. It uses the following conventions, which are a subset of those used by dc-extract.xsl:

If the resulting RDF graph is interpreted as a DCAM description set using the conventions of the DC-RDF recommendation [DC-RDF], then this would correspond to a DCAM interpretation for DC-HTML-2003 as follows.

An X/HTML document using the DC-HTML-2003 profile encodes a description set containing

A DCAM interpretation of DC-HTML-2003

The following is a "conservative" DCAM interpretation of the DC-HTML-2003 profile which is supported by all three of the approaches above. Note that this interpretation does not provide a mapping for the scheme attribute.

An X/HTML document using the DC-HTML-2003 profile encodes a description set containing

Appendix B: DC-HTML-2008 and the DCAM

In contrast to the case of DC-HTML-2003, the new DCMI Recommendation, Expressing Dublin Core using HTML/XHTML meta and link elements [DC-HTML-2008] is designed to support the encoding of a DC description set and the document describes explicitly a mapping between a subset of the features of the DCAM description set model and the X/HTML meta and link elements.

An X/HTML document using the DC-HTML-2008 profile encodes a description set containing

References

[ABSTRACT-MODEL]
DCMI Abstract Model DCMI Recommendation. 2007-06-04
http://dublincore.org/documents/2007/06/04/abstract-model/

[DC-EXTRACT]
Dublin Core Extraction Service
http://www.w3.org/2000/06/dc-extract/form.html

[DC-HTML-2003]
Expressing Dublin Core in HTML/XHTML meta and link elements DCMI Recommendation. 2003-11-30
http://dublincore.org/documents/2003/11/30/dcq-html/

[DC-HTML-2008]
Expressing Dublin Core using HTML/XHTML meta and link elements DCMI Recommendation. 2008-08-04
http://dublincore.org/documents/2008/08/04/dc-html/

[DC-LEVELS]
Interoperability levels for Dublin Core metadata
http://dublincore.org/architecturewiki/InteroperabilityLevels

[DC-RDF]
Expressing Dublin Core metadata using the Resource Description Framework (RDF) DCMI Recommendation. 2008-01-14
http://dublincore.org/documents/2008/01/14/dc-rdf/

[DC-TEXT]
Expressing Dublin Core metadata using the DC-Text format DCMI Recommended Resource. 2007-12-03
http://dublincore.org/documents/2007/12/03/dc-text/

[DC-XML-2003]
Guidelines for implementing Dublin Core in XML DCMI Recommendation. 2003-04-02
http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/

[DOMAINS]
Domains and Ranges for DCMI Properties
http://dublincore.org/documents/2008/01/14/domain-range/

[ERDF]
Embedded RDF
http://purl.org/NET/erdf/profile

[GRDDL]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Recommendation 11 September 2007
http://www.w3.org/TR/2007/REC-grddl-20070911/

[HTML-PROFILE]
Meta data profiles in HTML 4.01 Specification W3C Recommendation 24 December 1999.
http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3

[REV-TERMS]
Revisions to DCMI Metadata Terms
http://dublincore.org/usage/decisions/2008/dcterms-changes/

[RFC3986]
Berners-Lee, T., R. Fielding, L. Masinter. RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force (IETF). January 2005.
<http://www.ietf.org/rfc/rfc3986.txt>

[SYNTAXTUT]
DCMI Basic Syntaxes Tutorial DC-2007, Singapore
http://www.dc2007.sg/T2-BasicSyntaxes.pdf

Acknowledgements

xxxx

Copyright © 1995-2014 DCMI. All Rights Reserved.