Notes on the W3C XML Schemas for Qualified Dublin Core

Creator: Pete Johnston
Eduserv Foundation, UK
Contributor: Tim Cole
University of Illinois at Urbana-Champaign
Contributor: Thomas Habing
University of Illinois at Urbana-Champaign
Contributor: Jane Hunter
DSTC, University of Queensland
Contributor: Carl Lagoze
Cornell University
Contributor: Andy Powell
Eduserv Foundation, UK
Date Issued: 2008-02-11
Identifier: http://dublincore.org/schemas/xmls/qdc/2008/02/11/notes/
Replaces: http://dublincore.org/schemas/xmls/qdc/2006/01/06/notes/
Is Replaced By: Not applicable
Latest version: http://dublincore.org/schemas/xmls/qdc/notes/
Description of document: This document provides a brief description of a set of W3C XML Schemas which implement the XML encoding conventions described in the Guidelines for implementing Dublin Core™ in XML.

1. Introduction

The schema presented in this document conform to the W3C XML Schema 1.0 recommendation [XMLSCHEMA]. They are designed to support the conventions for representing Dublin Core™ metadata in XML that are described in the DCMI recommendation, Guidelines for implementing DC in XML [DCXMLGUIDELINES]. These schema are suggested rather than prescribed and may co-exist with other schema for exchanging Dublin Core™ metadata. XML schema are interoperability vehicles; the greater number of applications that agree on a single schema the greater the ability to easily share Dublin Core™ metadata. It is hoped that these schemas will be useful to a breadth of applications, but it is recognized that different functionality, provided by different schema, may be required by some.

Qualified Dublin Core™

The functionality these schema support is congruent with the Dublin Core™ model of "qualification" [DCPRINCIPLES]. Applications that employ other schema that express additional functionality should recognize that doing so potentially compromises interoperability with applications that use these schema.

The three schema for the DCMI namespaces declare XML elements to represent the Dublin Core™ elements and their refinements. The container schema provided here restrict the elements in a valid instance document to

  1. the 15 Dublin Core™ elements [DCMES],
  2. the additional elements listed in the DCMI Metadata Terms recommendation (e.g., "audience") [DCTERMS],
  3. the element refinements listed in the DCMI Metadata Terms recommendation [DCTERMS]

The value of a DC element or refinement - the XML element content - may be associated with one of the named encoding scheme, also listed in the DCMI Metadata Terms recommendation [DCTERMS].

Application profiles

This means that so-called application profiles that mix elements from other namespaces or metadata vocabularies are not valid according to these container schema. An application profile schema may import one or more of the base schema listed here and use them in association with schema for other non-DCMI namespaces. However, implementers adopting that approach should give consideration to the implications for interoperability with applications based on the schema which specify that only Dublin Core™ elements and element refinements are valid.

"Structured values"

According to the schema, the values of XML elements representing Dublin Core™ elements and element refinements may only have simple "string" values (which may be further restricted in the manner described below), defined by the type dc:SimpleLiteral in the schema. The use of the xml:lang attribute permits the recording of the language of the string that is the element value. Complex or structured values - i.e., the use of additional XML elements nested within the XML elements representing Dublin Core™ elements and element refinements - are not valid. By exploiting features of the XML schema specification, the proposed schema are designed so that it is possible to import the schema into an extension schema that does allow additional nested elements as values for the Dublin Core™ elements. Such extensions will not be valid according to the container schema listed here and, therefore, not interoperable except by translation methods (not yet defined here or by the DCMI).

2. The Schemas and their Use

The schemas were created jointly by an ad hoc working group of: Tim Cole (University of Illinois at Urbana-Champaign), Thomas Habing (University of Illinois at Urbana-Champaign), Jane Hunter (DSTC, University of Queensland), Pete Johnston (UKOLN, University of Bath), Carl Lagoze (Cornell University), Andy Powell (UKOLN, University of Bath)

Base schemas

These three schemas declare XML elements to represent the Dublin Core™ elements and element refinements and a number of complexTypes to represent encoding schemes:

  • Schema: dc.xsd
    Target XML Namespace: http://purl.org/dc/elements/1.1/
  • Schema: dcterms.xsd
    Target XML Namespace: http://purl.org/dc/terms/
  • Schema: dcmitype.xsd
    Target XML Namespace: http://purl.org/dc/dcmitype/

Container schemas

These schemas declare XML elements to act as containers for specified subsets of the Dublin Core™ elements and element refinements declared in the base schemas:

Sample application schemas

These schemas provide examples of how a container schema might be used in an application:

Schema: dc.xsd

Target XML Namespace: http://purl.org/dc/elements/1.1/

The schema dc.xsd defines a complexType called SimpleLiteral :

<xs:complexType name="SimpleLiteral">
 <xs:complexContent mixed="true">
  <xs:restriction base="xs:anyType">
   <xs:sequence>
    <xs:any processContents="lax" minOccurs="0" maxOccurs="0"/>
   </xs:sequence>
   <xs:attribute ref="xml:lang" use="optional"/>
  </xs:restriction>
 </xs:complexContent>
</xs:complexType>

The SimpleLiteral complexType makes the xml:lang attribute available. The type is defined in terms of mixed complexContent. However , the cardinality attributes on the xs:any element dictate that this complexType does not permit child elements.

The fifteen Dublin Core™ elements in this namespace are represented as XML elements. The schema declares an abstract element any with a type of SimpleLiteral. Because it is declared as abstract, this element can not be used in an instance document. Each XML element representing a Dublin Core™ element is declared as a non-abstract element which is substitutable for the any element e.g.

<xs:element name="title" substitutionGroup="any"/>

Finally, the schema defines a group elementsGroup and a complexType elementContainer. With the dc:any element, these two constructs provide mechanisms by which external schemas can reference the set of elements declared in this schema without referencing each element individually - though it is still possible for an external schema to reference individual elements if desired.

For example, a schema can simply import the dc.xsd schema and use the elementContainer complexType as the type of an element, and this would make the DC elements available as child elements.

<xs:import namespace="http://purl.org/dc/elements/1.1/"
              schemaLocation="dc.xsd"/>

<xs:element name="simpledc" type="dc:elementContainer"/>

Such a schema is provided as simpledc.xsd.

The simpledc.xsd schema does not use a targetNamespace. It is possible to validate an instance directly against this schema. DCMI makes no recommendation for the XML Namespace with which this simpledc container element is associated. Where an application wishes to specify a namespace for the container element, it can be assigned when this schema is included in an application schema.

An example of such an application schema is provided as appsimpledc.xsd.

An example of an instance document which validates against that application schema is provided as testsimpledc.xml.

An example of an instance document which fails to validate against that application schema is provided as testsimpledc2.xml. (dcterms:modified not permitted.)

Note: You can reference the simpledc.xsd schema in your application if you wish. The appsimpledc.xsd schema, however, is provided as an example only. It uses an XML Namespace name based on a reserved DNS name (example.org). You must create your own version of this schema.

Schema: dcterms.xsd

Target XML Namespace: http://purl.org/dc/terms/

The schema dcterms.xsd imports the schema dc.xsd. The Dublin Core™ elements and element refinements in this namespace are all represented as XML elements, and importing the dc.xsd schema makes the any abstract element and the SimpleLiteral complexType available for use. Importing the dc.xsd schema also enables the indication of relationships between DC element refinements and the elements that they refine, using substitutionGroups.

An XML element which represents a DC element in this namespace is declared as substitutable for the any abstract element:

<xs:element name="audience" substitutionGroup="dc:any"/>

And an XML element which represents a DC element refinement is declared as substitutable for the element it refines. This includes the XML elements corresponding to the fifteen new properties added to the DCTERMS namespace in January 2008:

<xs:element name="title" substitutionGroup="dc:title"/>

<xs:element name="alternative" substitutionGroup="title"/>

Encoding schemes are mechanisms for constraining the "value spaces" of DC elements and element refinements. In this schema, they are represented as named complexTypes derived from the SimpleLiteral complexType. For example, the complexType corresponding to the encoding scheme for "W3CDTF" is as follows:

<xs:complexType name="W3CDTF">
 <xs:simpleContent>
  <xs:restriction base="dc:SimpleLiteral">
      <xs:simpleType>
         <xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/>
      </xs:simpleType>
      <xs:attribute ref="xml:lang" use="prohibited"/>
  </xs:restriction>
 </xs:simpleContent>
</xs:complexType>

N.B. Some schema-validating XML parsers may not support this construct. See Appendix A.

The use of one of these complexTypes is specified by the use of the xsi:type attribute in the instance document. The value of the xsi:type attribute is a QName correponding to the name of the complexType:

<dc:date xsi:type="dcterms:W3CDTF">2002-07-09</date>

Use of this datatype means that a validating parser will check that the element content conforms to one of the builtin date/time types.

Not all of the complexTypes associated with encoding schemes impose such "tight" validation. For example, the complexType for "LCSH" prescribes only that the element content is a character string:

<xs:complexType name="LCSH">
 <xs:simpleContent>
  <xs:restriction base="dc:SimpleLiteral">
      <xs:simpleType>
        <xs:restriction base="xs:string"/>
      </xs:simpleType>
      <xs:attribute ref="xml:lang" use="prohibited"/>
  </xs:restriction>
 </xs:simpleContent>
</xs:complexType>

In theory at least, it is possible to define a complexType which enumerates all the possible values of a Library of Congress Subject Heading, but it would be impractical to validate against such a list. However, the principle of validating against an enumerated list of values is illustrated in the schema dcmitype.xsd for the DCMI Type Vocabulary (see next section).

An example schema which takes this approach for ISO639-2 language codes is available at http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_schemas/iso639-2.xsd.

Similarly to the dc.xsd schema, the dcterms.xsd schema defines a group elementsAndRefinementsGroup as a means of referring to all the elements and element refinements. A complexType elementOrRefinementContainer is also defined.

A schema can simply import the dcterms.xsd schema and use the elementOrRefinementContainer complexType as the type of an element, and this would make the DC elements and element refinements available as child elements.

<xs:import namespace="http://purl.org/dc/terms/"
              schemaLocation="dcterms.xsd"/>

<xs:element name="qualifieddc" type="dcterms:elementOrRefinementContainer"/>

An example of such a schema is provided as qualifieddc.xsd.

Like the simpledc.xsd schema, the qualifieddc.xsd schema does not use a targetNamespace. An implementation may validate directly against this schema or it may specify a namespace for the container element by including this schema in an application schema.

An example of such an application schema is provided as appqualifieddc.xsd.

An example of an instance document which validates against that application schema is provided as testqualifieddc.xml.

An example of an instance document which fails to validate against that application schema is provided as testqualifieddc2.xml. (dcterms:mdified not permitted)

Note: As in the case of the simpledc.xsd schema, you can reference the qualifieddc.xsd schema in your application if you wish. The appqualifieddc.xsd schema, however, is provided as an example only. It uses an XML Namespace name based on a reserved DNS name (example.org). You must create your own version of this schema.

Schema: dcmitype.xsd

Target XML Namespace: http://purl.org/dc/dcmitype/

The dcmitype.xsd includes only a named simpleType which defines an enumerated list of values for the DCMI Type Vocabulary.

This simpleType is referenced in a complexType in the dcterms.xsd schema.

Appendix A : Parser Behaviour

The parsers/validators tested

  • XSV 2.2-1 of 2002/12/01 21:59:33
  • Xerces Xerces-J 2.2.1 2002/11/11 17:40
  • MSXML4 Microsoft XML Core Services 4.0 SP1

Results

testsimpledc.xml
Parser Result Messages
XSV Schema and instance accepted as valid  
Xerces Schema and instance accepted as valid  
MSXML4 Schema and instance accepted as valid  
testqualifieddc.xml
XSV Schema and instance accepted as valid  
Xerces Schema dcterms.xsd rejected as invalid [Error] dcterms.xsd:nnn:nn: src-ct.2: Complex Type Definition Representation Error for type 'xxxx'. When simpleContent is used, the base type must be a complexType whose content type is simple, or, only if extension is specified, a simple type.
(where 'xxxx' is the name of a complexType corresponding to one of the encoding schemes.)
MSXML4 Schema and instance accepted as valid  

The "dc:SimpleLiteral" problem

The schema dc.xsd defines a base complexType called SimpleLiteral :

<xs:complexType name="SimpleLiteral">
 <xs:complexContent mixed="true">
  <xs:restriction base="xs:anyType">
   <xs:sequence>
    <xs:any processContents="lax" minOccurs="0" maxOccurs="0"/>
   </xs:sequence>
   <xs:attribute ref="xml:lang" use="optional"/>
  </xs:restriction>
 </xs:complexContent>
</xs:complexType>

Encoding schemes are represented as complexTypes derived from the SimpleLiteral complexType. For example, the complexType corresponding to the encoding scheme for "W3CDTF" is as follows:

<xs:complexType name="W3CDTF">
 <xs:simpleContent>
  <xs:restriction base="dc:SimpleLiteral">
      <xs:simpleType>
         <xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/>
      </xs:simpleType>
      <xs:attribute ref="xml:lang" use="prohibited"/>
  </xs:restriction>
 </xs:simpleContent>
</xs:complexType>

This derivation of a complexType with simpleContent by restriction of a base complexType with complexContent is valid under section 3.4.6 of XML Schema Part 1: Structures, specifically item 5.1.2 of the section "Schema Component Constraint: Derivation Valid (Restriction, Complex)", because the base complexContent is mixed and emptiable.

This was confirmed by Henry Thompson, see e.g.
http://www.w3.org/2001/05/xmlschema-rec-comments#pfiSimpleContent
http://lists.w3.org/Archives/Public/xmlschema-dev/2002Oct/0005.html
http://lists.w3.org/Archives/Public/xmlschema-dev/2002Oct/0008.html

Conclusion: Xerces appears to be behaving incorrectly in rejecting this derivation.

References

[XMLSCHEMA] XML Schema
http://www.w3.org/XML/Schema

[DCXMLGUIDELINES] Guidelines for implementing Dublin Core™ in XML
http://dublincore.org/specifications/dublin-core/dc-xml-guidelines/

[DCPRINCIPLES] DCMI Grammatical Principles
http://dublincore.org/specifications/dublin-core/grammatical-principles/

[DCMES] Dublin Core™ Metadata Element Set, Version 1.1: Reference Description
http://dublincore.org/specifications/dublin-core/dces/

[DCTERMS] DCMI Metadata Terms
http://dublincore.org/specifications/dublin-core/dcmi-terms/