Notes on the DC-DS-XML XML Format

Creators: Pete Johnston
Eduserv Foundation, UK
Date Issued: 2008-09-01
Latest Version: https://dublincore.org/specifications/dublin-core/dc-ds-xml-notes/
Release History: https://dublincore.org/specifications/dublin-core/dc-ds-xml-notes/release_history/
Description: This document describes the background to the development of Expressing Dublin Core Description Sets using XML (DC-DS-XML), and its relationship to the document Guidelines for implementing Dublin Core in XML (2003-04-02).

Table of contents

  1. Introduction
  2. Background
  3. The DC-DS-XML XML Format
  4. DC-DS-XML, XML Namespaces and the DCMI Namespace Policy
  5. DC-DS-XML and DC-XML-2003

1. Introduction

In September 2008, DCMI published the document Expressing Dublin Core™ Description Sets using XML (DC-DS-XML) [DC-DS-XML] as a DCMI Proposed Recommendation. It replaces the previously circulated Working Draft Expressing Dublin Core™ using XML [DC-XML-2006].

This document describes the background to the development of the Proposed Recommendation DC-DS-XML, outlines the functionality it provides, and describes its relationship to other DCMI specifications.

2. Background

2.1 The DCMI Abstract Model

Since 2003, DCMI has sought to formalize its model for Dublin Core™ metadata, and this has resulted in the publication of the DCMI Abstract Model [ABSTRACT-MODEL], the second version of which was given the status of DCMI Recommendation in June 2007.

The Abstract Model defines an abstract information structure called a description set. In order for applications to store or exchange DC metadata description sets, instances of those information structures must be represented in some concrete digital form according to the rules of a format or syntax. The DCMI Abstract Model itself does not define any such concrete formats or syntaxes for representing a DC metadata description set; DCMI defers that role to the family of specifications it refers to as "encoding guidelines".

Such a specification performs three functions:

  • it defines the subset of the features of the DCAM description set model which the syntax supports

  • it describes how each of the supported constructs and components of the DCAM description set are "encoded" in the concrete format

  • (conversely) it describes how features of the format are to be interpreted or "decoded" as representing constructs and components of the DCAM description set

The role of "encoding guidelines" and their relationship to the DCMI Abstract Model is illustrated graphically in the introduction to the tutorial on "Basic Syntax" presented at the DC-2007 conference [SYNTAXTUT].

2.2 Expressing Dublin Core™ using XML

In order to represent a DC metadata description set in an XML document, the constructs and components of the description set must be represented as components in the XML document, i.e. as XML elements and XML attributes, XML element names and XML attribute names, and as XML element content and XML attribute values.

The current DCMI recommendation for expressing DC metadata using XML, Guidelines for implementing Dublin Core™ in XML [DC-XML-2003] does not fulfil these requirements because it pre-dated the development of the DCMI Abstract Model. (The relationship of the DC-XML-2003 format to the DCMI Abstract Model is discussed further in Section 5 and Appendix A below.)

In June 2006, the Working Draft Expressing Dublin Core™ metadata using XML [DC-XML-2006] was released for public comment. As a result of comments received and subsequent discussions within the DCMI Architecture Forum, work continued in parallel on drafts for two different XML formats, one supporting the full description set model of the Abstract Model, known as "DC-XML-Full", and the other supporting only a subset of that model, known as "DC-XML-Min". The drafts for both formats were updated in 2007 to reflect changes made to the DCMI Abstract Model.

Following discussions at the meeting of the DCMI Architecture Forum at the DC-2007 conference and in subsequent telecons, it was decided to put forward a modified version of the "DC-XML-Full" format as a Proposed Recommendation: this is the format now known as "DC-DS-XML" (where the "DS" is an abbreviation for "Descripton Set"), and described in the document Expressing Dublin Core™ Description Sets using XML (DC-DS-XML) [DC-DS-XML].

Work continues within the DCMI Architecture Forum on identifying the requirements for other XML format(s) for representing DC metadata in XML.

2.3 Expressing Dublin Core™ using RDF

Also in January 2008, DCMI published the document Expressing Dublin Core™ using the Resource Description Framework (RDF) [DC-RDF] as a DCMI Recommendation. This document described how the features of the DCMI Abstract Model description set model are represented using the RDF model, and replaced earlier DCMI specifications for expressing DC metadata in RDF.

2.4 Gleaning Resource Descriptions from Dialects of Languages (GRDDL)

Gleaning Resource Descriptions from Dialects of Languages (GRDDL) [GRDDL] is a W3C Recommendation which describes a set of conventions for associating an XML document with an algorithm for the extraction of a set of RDF triples from that document. One of the mechanisms defined by GRDDL is the association of what it calls a Namespace Transformation with an XML Namespace Name, so that the transformation can be applied to extract RDF triples from any document which uses that XML Namespace Name in the name of its root element.

2.5 Interoperability Levels for Dublin Core™ Metadata

The DCMI Architecture Forum is currently developing a draft document titled Interoperability levels for Dublin Core™ metadata [DC-LEVELS].

It describes several different categories or "levels" of interoperability that may be enabled using DC metadata, and specifies for each level the requirements that should be met by a metadata provider (and the expectations that a metadata consumer can expect to be satisfied).

3. The DC-DS-XML Format (2008)

The current DC-DS-XML format described in the Proposed Recommendation emerges from, and is directly shaped by, several of the developments listed above.

The primary purpose of the DC-DS-XML format is to enable what the "levels" document calls "DCAM-based syntactic interoperability" ("Level 3" interoperabilty), by providing rules for interpreting an instance of the format as a DC description set.

A pre-requisite for this is to support "Semantic interoperability" ("Level 2" interoperabilty), based on the RDF model. So the format also provides rules for interpreting an instance of the format as an RDF Graph, using the conventions specified in the DCMI Recommendation for representing DC metadata in RDF [DC-RDF]. Further, it provides an algorithm which implements this mapping to an RDF Graph in the form of a GRDDL Namespace Transformation.

The principles applied to the design of the DC-DS-XML format are described in the introduction to the document:

  • The format should provide a serialization of all the features of the "Description Set Model" of the Abstract Model, i.e. it should be possible to represent all the constructs that make up a DC metadata description set.

  • The format is not required to address the features of the "Vocabulary Model" of the DCAM. For example, it is not required to express subproperty relationships between properties, subclass relationships between classes, etc.

  • The format should be easily usable with XML-based specifications such as XPath, XPointer and XQuery, i.e. for each construct in the DCAM there should be a mapping to exactly one construct in the XML syntax.

  • The format should not be dependent on features of a single XML Schema language.

  • It should be possible to describe the format using W3C XML Schema [XMLSCHEMA]. When the format is used to serialize description sets conforming to a DC Application Profile [DCAP], it is however not a requirement that all structural constraints expressed in the corresponding Description Set Profile [DSP] be captured using W3C XML Schema.

The DC-DS-XML format provides a "base-line" XML format for serializing DC description sets: it meets the basic requirement of providing an XML serialization which supports the full description set model, and can be processed straightforwardly using XML technologies. It is not intended to be the only XML format for DC metadata: it is not proposed that it replaces the current DCMI Recommendation, Guidelines for implementing Dublin Core™ in XML [DC-XML-2003]. In addition to the DC-DS-XML format, the DCMI Architecture Forum continues to work on identifying requirements for other XML format(s) for serializing description sets.

3.1 XML Schema for DC-DS-XML

A draft W3C XML Schema for the DC-DS-XML format is available. The URI of the current version of the schema is http://dublincore.org/schemas/xmls/2008/09/01/dc-ds-xml/dcds.xsd

3.2 GRDDL Namespace Transformation for DC-DS-XML

A draft GRDDL Namespace Transformation for the DC-DS-XML format is available, in the form of an XSLT stylesheet which transforms a DC-DS-XML instance into RDF/XML. The URI of the current version of the transform is http://purl.org/dc/transform/2008/09/01/dc-ds-xml-20080901-grddl/dcds2rdfxml.xsl

The transform is accessible to a GRDDL application via the "namespace document" obtained by dereferencing the XML Namespace Name http://purl.org/dc/xmlns/2008/09/01/dc-ds-xml/

N.B. The XSLT transform is currently work-in-progress and there are some aspects of the DC-DS-XML format which are not supported in the current version.

4. DC-DS-XML, XML Namespaces and the DCMI Namespace Policy

The DCMI Recommendation DCMI Namespace Policy describes the conventions that DCMI uses to assign URIs to metadata terms owned and managed by DCMI, and to collections of those term URIs, referred to as DCMI Namespaces, and the policy commitments it associates with those DCMI term URIs and DCMI namespace URIs, particularly with respect to their persistence.

The Namespace Policy document does not currently address the case of XML Namespace Names. The Proposed Recommendation for the DC-DS-XML format defines an XML Namespace with the name (URI) http://purl.org/dc/xmlns/2008/09/01/dc-ds-xml/

Several points are worth noting:

  • The current choice of XML Namespace Name is provisional, and may change in subsequent versions of the specification.

  • A PURL is suggested rather than a dublincore.org URI; this may be changed to make use of a dublincore.org URI if that is considered appropriate.

  • The URI is "date-stamped". This is in order to allow for changes in the XML format which result in changes to the XML Namespace Name if required, though there may also be changes to the format which do not require changes in the XML Namespace Name.

The conventions and policies for DCMI-owned XML Namespace Names are a topic for discussion for the DCMI Architecture Forum, with a view to extending the Namespace Policy document to address this area.

5. DC-DS-XML and DC-XML-2003

The current DCMI Recommendation for expressing DC metadata using XML, Guidelines for implementing Dublin Core™ in XML [DC-XML-2003] was not defined in terms of the DCAM description set model, or of an RDF Graph. It provides its own "abstract models" for a "simple DC record" and a "qualified DC record", and specifies an XML format for the representation of instances of those two models.

Any interpretation of the DC-XML-2003 format in terms of mapping to the constructs of an RDF Graph and of a DCAM description set must be constructed retrospectively. The limitations of the design of the DC-XML-2003 format mean that any such mapping is at best approximate as it relies on assumptions that may not accurately reflect the intent of metadata creators, and can be made only for some features of the format. Appendix A describes such a suggested mapping for the DC-XML-2003 format.

The features of the DCAM description set model supported by the two XML formats -- DC-XML-2003 (on the basis of the interpretation suggested in Appendix A) and DC-DS-XML -- are summarised in the following table:

DCAM Description Set Model feature Supported in DCAM Description Set Model Supported in DC-XML-2003 Supported in DC-DS-XML
description set One description set One description set One description set
description One-to-many descriptions One description One-to-many descriptions
described resource URI One per description; any URI Not supported One per description; any URI
statement One-to-many statements per description One-to-many statements per description One-to-many statements per description
property URI One per statement; any URI One per statement; any URI One per statement; any URI
literal value surrogate One per statement One per statement; partial support One per statement
literal value surrogate / value string One per literal value surrogate One per literal value surrogate; partial support One per literal value surrogate
literal value surrogate / value string language Zero-to-one per value string Zero-to-one per value string Zero-to-one per value string
literal value surrogate / SES URI Zero-to-one per value string Not supported Zero-to-one per value string
non-literal value surrogate One per statement Not supported One per statement
non-literal value surrogate / value string Zero-to-many per non-literal value surrogate Not supported Zero-to-many per non-literal value surrogate
non-literal value surrogate / value string language Zero-to-one per value string Not supported Zero-to-one per value string
non-literal value surrogate / SES URI Zero-to-one per value string Not supported Zero-to-one per value string
non-literal value surrogate / value URI Zero-to-one per non-literal value surrogate Not supported Zero-to-one per non-literal value surrogate
non-literal value surrogate / VES URI Zero-to-one per non-literal value surrogate Not supported Zero-to-one per non-literal value surrogate

Appendix A: Guidelines for implementing Dublin Core™ in XML (DC-XML-2003) and the DCMI Abstract Model

The current DCMI recommendation for expressing DC metadata using XML, Guidelines for implementing Dublin Core™ in XML (DC-XML-2003) pre-dated the development of the DCAM. That document provides its own "abstract models" for a "simple Dublin Core™ metadata record" and a "qualified Dublin Core™ metadata record", and specifies an XML format for the representation of instances of those two models.

However, the two models described by that document differ from the description set model provided by the DCAM: they use some different types of construct from those used by the DCAM, and also use different labels for constructs which are essentially similar to those used by the DCAM.

A.1 Simple Dublin Core™ (DC-XML-2003)

The "abstract model" for a "simple Dublin Core™ record" provided by DC-XML-2003 is:

  • A simple Dublin Core™ record is made up of one or more properties and their associated values.

  • Each property is an attribute of the resource being described.

  • Each property must be one of the 15 DCMES [DCMES] elements.

  • Properties may be repeated.

  • Each value is a literal string.

  • Each literal string value may have an associated language (e.g. en-GB).

Note that this is a much simpler model than that of the description set defined by the DCMI Abstract Model. In particular

  • It has no construct analogous to that of the description set

  • It has no construct analogous to that of the described resource URI

  • It limits property URIs to a fixed set of URIs

  • It makes no distinction analogous to that between a non-literal value surrogate and a literal value surrogate

  • It has no construct analogous to that of the syntax encoding scheme URI

  • It has no construct analogous to that of the value URI

  • It has no construct analogous to that of the vocabulary encoding scheme URI

  • It has no concept analogous to that that a non-literal value surrogate may include multiple value strings

On the basis of the description of the "simple DC record" model alone, it is not possible to determine whether a (simple DC record) "value" corresponds to:

  • A literal value surrogate containing a value string

  • A non-literal value surrogate containing a value string

To construct a mapping from the "simple DC record" model to (a subset of) the DCAM description set model, it is necessary to make a choice between those two options.

If one makes the assumption that the intent in the "simple DC record" model is to capture (in terms of the Abstract Model), statements containing literal value surrogates, then the following table specifies a mapping between the "simple DC record" model and the description set model, such that the assertions made by the description set correspond to the assertions made by the "simple DC record".

DC-XML-2003 DCAM
"Simple DC record" description set containing a single description
"Property + Value" statement
"URI of Property" property URI
"Value" literal value surrogate/value string
"Language" value string language

A.2 Qualified Dublin Core™ (DC-XML-2003)

The "abstract model" for a "qualified DC record" provided by DC-XML-2003 is:

  • A qualified DC record is made up of one or more properties and their associated values.

  • Each property is an attribute of the resource being described.

  • Each property must be either:

    • one of the 15 DC elements,

    • one of the other elements recommended by the DCMI (e.g. audience) [DCTERMS],

    • one of the element refinements listed in the DCMI Metadata Terms recommendation [DCTERMS].

  • Properties may be repeated.

  • Each value is a literal string.

  • Each value may have an associated encoding scheme.

  • Each encoding scheme has a name.

  • Each literal string value may have an associated language (e.g. en-GB).

Again this is a simpler model than that of the description set defined by the DCMI Abstract Model. As above

  • It has no construct analogous to that of the description set

  • It has no construct analogous to that of the described resource URI

  • It limits property URIs to a fixed set of URIs

  • It makes no distinction analogous to that between a non-literal value surrogate and a literal value surrogate

  • It has no construct analogous to that of the syntax encoding scheme URI

  • It has no construct analogous to that of the value URI

  • It has no construct analogous to that of the vocabulary encoding scheme URI

  • It has no concept analogous to that that a non-literal value surrogate may include multiple value strings

For the "qualified DC record" model, the construction of a mapping to the DCAM description set model is more problematic.

As for the "simple DC record" case, there is no distinction between literal value surrogate and non-literal value surrogate. So, as above, on the basis of the description of the "qualified DC record" model alone, it is not possible to determine whether a (qualified DC record) "value" corresponds to:

  • A literal value surrogate containing a value string

  • A non-literal value surrogate containing a value string

Further, the "qualified DC record" model introduces a concept of "encoding scheme" but does not distinguish vocabulary encoding scheme URIs from syntax encoding scheme URIs, but it is not possible to determine whether a combination of (qualified DC record) "value" and "encoding scheme" corresponds to:

  • A literal value surrogate containing a value string plus syntax encoding scheme URI

  • A non-literal value surrogate containing a value string plus syntax encoding scheme URI

  • A non-literal value surrogate containing a value string plus vocabulary encoding scheme URI

If one makes the same assumption as for the "simple DC record" case, that the intent in the "qualified DC record" model is to capture (in terms of the Abstract Model), statements containing literal value surrogates, then only the first of the three options is possible for the interpretation of the "encoding scheme". However, examples in the DC-XML-2003 specification include references to "encoding schemes" which are vocabulary encoding schemes so a mapping of "encoding scheme" to syntax encoding scheme would not be correct in all cases. The only "safe" option would appear to be not to define a mapping for DC-XML-2003 "encoding schemes".

On that basis, the following table specifies a mapping between the "qualified DC record" model and the description set model, such that the assertions made by the description set correspond to the assertions made by the "qualified DC record".

DC-XML-2003 DCAM
"Qualified DC record" description set containing a single description
"Property + Value" statement
"URI of Property" property URI
"Value" literal value surrogate/value string
"URI of Encoding Scheme" no mapping
"Language" value string language

Some points to note

  • The retrospective creation of such a mapping is necessarily approximate

  • No mapping is provided here for what DC-XML-2003 calls "encoding schemes".

  • The mapping of "value" to a literal value surrogate/value string may not be compatible with the intent behind the original model, where the intent seems to be to support either a literal value surrogate/value string or a non-literal value surrogate/value string or a non-literal value surrogate/value URI.

  • The mapping of "value" to a literal value surrogate/value string may introduce contradictions arising from the range of the property.

References

[ABSTRACT-MODEL]
DCMI Abstract Model DCMI Recommendation. 2007-06-04
http://dublincore.org/specifications/dublin-core/abstract-model/2007-06-04/

[DC-DS-XML]
Expressing Dublin Core™ Description Sets using XML (DC-DS-XML) DCMI Proposed Recommendation. 2008-09-01
http://dublincore.org/specifications/dublin-core/dc-ds-xml/2008-09-01/

[DC-HTML]
Expressing Dublin Core™ using HTML/XHTML meta and link elements DCMI Recommendation. 2008-08-04
http://dublincore.org/specifications/dublin-core/dc-html/2008-08-04/

[DC-LEVELS]
Interoperability levels for Dublin Core™ metadata
http://dublincore.org/architecturewiki/InteroperabilityLevels

[DC-RDF]
Expressing Dublin Core™ metadata using the Resource Description Framework (RDF) DCMI Recommendation. 2008-01-14
http://dublincore.org/specifications/dublin-core/dc-rdf/2008-01-14/

[DC-TEXT]
Expressing Dublin Core™ metadata using the DC-Text format DCMI Recommended Resource. 2007-12-03
http://dublincore.org/specifications/dublin-core/dc-text/2007-12-03/

[DC-XML-2003]
Guidelines for implementing Dublin Core™ in XML DCMI Recommendation. 2003-04-02
http://dublincore.org/specifications/dublin-core/dc-xml-guidelines/2003-04-02/

[DC-XML-2006]
Expressing Dublin Core™ metadata using XML DCMI Working Draft. 2006-05-29
http://dublincore.org/specifications/dublin-core/dc-xml/2006-05-29/

[DCAP]
The Singapore Framework for Dublin Core™ Application Profiles DCMI Recommended Resource. 2008-01-14
http://dublincore.org/specifications/dublin-core/singapore-framework/2008-01-14/

[DCMES]
Dublin Core™ Metadata Element Set, Version 1.1 DCMI Recommendation. 2008-01-14
http://dublincore.org/specifications/dublin-core/dces/2008-01-14/

[DCTERMS]
DCMI Metadata Terms DCMI Recommendation. 2008-01-14
http://dublincore.org/specifications/dublin-core/dcmi-terms/2008-01-14/

[DCMI-NAMESPACE]
Namespace Policy for the Dublin Core™ Metadata Initiative (DCMI) DCMI Recommendation. 2007-07-02
http://dublincore.org/specifications/dublin-core/dcmi-namespace/2007-07-02/

[DOMAINS]
Domains and Ranges for DCMI Properties
http://dublincore.org/specifications/dublin-core/domain-range/2008-01-14/

[DSP]
Description Set Profiles: A constraint language for Dublin Core™ Application Profiles DCMI Working Draft. 2008-03-31
http://dublincore.org/specifications/dublin-core/dc-dsp/2008-03-31/

[GRDDL]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Recommendation 11 September 2007
http://www.w3.org/TR/2007/REC-grddl-20070911/

[REV-TERMS]
Revisions to DCMI Metadata Terms
http://dublincore.org/usage/decisions/2008/dcterms-changes/

[RFC3986]
Berners-Lee, T., R. Fielding, L. Masinter. RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force (IETF). January 2005.
http://www.ietf.org/rfc/rfc3986.txt

[SYNTAXTUT]
DCMI Basic Syntaxes Tutorial DC-2007, Singapore
http://www.dc2007.sg/T2-BasicSyntaxes.pdf

[XMLSCHEMA]
XML Schema Part 0: Primer Second Edition. W3C Recommendation 28 October 2004.
http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/