Notes on DCMI specifications for expressing Dublin Core™ metadata in RDF
KMR Group, NADA, KTH (Royal Institute of Technology), Sweden
|Description:||This document serves as a guide to implementers to the changes introduced with the 2008-01-14 DCMI Recommendation "Expressing Dublin Core metadata using the Resource Description Framework (RDF)" (and subsequent revisions) with respect to legacy DCMI specifications.|
Table of contents
- Support for domains and ranges
- Support for value strings
- Deprecated constructs
- Other changes
In January 2008, DCMI released "Expressing Dublin Core™ metadata in the Resource Description Framework (RDF)" as a DCMI Recommendation [DC-RDF]. This Recommendation replaces two legacy DCMI documents:
- Expressing Simple Dublin Core™ in RDF/XML [DCMES-XML], a DCMI Recommendation from July 2002;
- Expressing Qualified Dublin Core™ in RDF/XML [DCQ-RDFXML], a DCMI Proposed Recommendation from May 2002.
This document provides a guide to the changes introduced with the 2008 Recommendation.
Since 1997, the "Dublin Core™ data model" has evolved alongside W3C's Resource Description Framework (RDF). This process has resulted in the DCMI Abstract Model[ABSTRACT-MODEL], which was published in March 2005 as a DCMI Recommendation. The DCMI Abstract Model (DCAM) now provides a reference model on the basis of which particular Dublin Core™ expressions can be defined.
Since the publication of the DCAM, the Architecture Working Group has been preparing a new expression of Dublin Core™ in RDF. In March 2006, Mikael Nilsson (Royal Institute of Technology, Sweden) finalized an early draft of the 2008 Recommendation for a DCMI Public Comment period in June 2006. On the basis of feedback received in the Public Comment period and on the basis of an updated version of the DCMI Abstract Model, an updated version of the new RDF expression was prepared for Public Comment in April 2007 and finalized as as a DCMI Recommendation in January 2008.
The new specification represents a significant step in the evolution of the Dublin Core™ RDF expressions compared to the two earlier specifications. Historically, Dublin Core™ metadata expressed in RDF has suffered from a number of problems, including:
the existence of two legacy expressions of Dublin Core™ metadata in RDF [DCMES-XML, DCQ-RDF-XML];
conflicting recommendations in the above documents regarding the use of literal strings in Dublin Core™ metadata;
implementation complexity due to the use of idiosyncrasic constructs;
difficulties in providing formal ranges and domains for Dublin Core™ properties.
The legacy RDF expression specifictions, which predate the DCAM, contain constructs that are incompatible with concepts in the DCAM.
The January 2008 specification addresses these problems in ways described below.
The most significant change introduced by the January 2008 Recommendation is the addition of support for domains and ranges of properties in general, and of DCMI-defined properties in particular. DCMI metadata terms have hitherto been defined exclusively in natural language. The RDF expression of the DCMI term set (e.g., http://dublincore.org/2003/03/24/dces) served essentially to convey these English-language definitions in a form ingestable by RDF applications. As part of the process of clarifying the RDF expression for Dublin Core™ metadata, it became evident that DCMI would benefit from supplementing these English-language definitions with machine-understandable declarations of domains and ranges. Such additional, machine-understandable precision is necessary as Dublin Core™ is deployed in the context of inference engines and ontology-based applications. In January 2008, the DCMI Usage Board is finalized the assignment of formal domains and ranges which make explicit the meanings intended in natural-language definitions [DOMAIN-RANGE]; see also the document "DCMI Metadata Terms" [DCMI-TERMS], which has been updated to reflect these changes, and the decision document "Revisions to DCMI Metadata Terms" [DCTERMS-CHANGES], which contains a detailed summary of all changes to the DCMI metadata terms introduced in the January 2008 specifications.
Literal values of properties without Literal ranges
For most DCMI metadata terms, the process of clarifying domains and ranges machine-understandably is straightforward and unambiguous. However, one problem with regard to legacy metadata usage is serious enough to bear closer scrutiny.
The Dublin Core™ community has long distinguished between Simple and Qualified Dublin Core™ -- a distinction reflected in the difference between the specifications "Expressing Simple Dublin Core™ in RDF/XML" [DCMES-XML] and the "Expressing Qualified Dublin Core™ in RDF/XML" [DCQ-RDF-XML]. The two legacy specifications differ with regard to whether properties such as
dc:date have values that are non-literal resources (e.g., a Person or a Date, seen as entities), or literals representing the resources (i.e., a value string). In "Expressing Simple Dublin Core™ in RDF/XML", a
dc:creator is a name:
In "Expressing Qualified Dublin Core™ in RDF/XML", in contrast, a
dc:creator is an entity, as in:
<http://www.example.com> dc:creator <http://www.example.org/person32>
<http://www.example.com> dc:creator _:xxx .
_:xxx rdf:type foaf:Person .
_:xxx rdf:value "John Smith" .
The new RDF encoding specification supports both of these constructs but bases the choice of one form over the other on the range of a property. A property with a "literal" range will follow the former pattern, while a property with a "non-literal" range will follow the latter.
In accordance with this approach, the DCMI Usage Board has assigned appropriate ranges to the DCMI properties. A range of "Agent" has been given to
dcterms:contributor, where "Agent" is defined as "A resource that acts or has the power to act". Similarly, appropriate ranges have been specified for the other DCMI terms. The range "Literal" applies only to metadata terms which are typically associated with a single value string, such as
Using this approach,
dcterms:creator refers to an entity which can be identified (e.g., assigned an identifier in an authority file) and described in its own right (e.g., with a name, an affiliation, and a birth date). The English-language definitions of these terms bear out this interpretation:
dcterms:creator is "An entity primarily responsible for making the resource", examples being "a person, an organization, or a service". However, the legacy usage comment associated with this definition reflects the ambiguity: "Typically, the name of a Creator should be used to indicate the entity".
In most cases, the appropriate range of a term has become reasonably obvious through a decade of implementation practice. In some cases, such as
dc:contributor, that usage has been ambiguous, so the assignment of any specific range would make one or another part of the legacy metadata appear invalid in the context of machine processing. Declaring "Agent" as the range of
dc:creator would mean that inferencing applications would expect to treat the value of the
dc:creatorproperty as a non-literal entity. Where legacy metadata represents names as literal values for
dc:creator, applications would need to treat these as "special cases" in order to merge them with metadata in which those names were associated with the expected non-literal entity constructs. The legacy specifications did not properly address these ambiguities, with the result that an unknown amount of Dublin Core-based RDF data is inconsistent with the intended semantics of the Dublin Core™ properties.
The clarification of these ambiguities through the assignment of domains and ranges is currently considered to be a desirable step towards ensuring the long-term viability of Dublin Core™ in RDF. However, one important compromise has been reached: domains and ranges will only be asserted for properties in the http://purl.org/dc/terms/ namespace, including copies of the properties in the http://purl.org/dc/elements/1.1/ namespace. Thus,
dc:creatorwill still have an unspecified range and can be used with both literal and non-literal values, while
dcterms:creator will have a (non-literal) range of Agent.
Impact on legacy Dublin Core™ metadata
The declaration of domains and ranges for DCMI properties has important implications for the interpretation of legacy Dublin Core™ metadata in RDF. However, the interpretation of Dublin Core™ metadata in other formats, such as HTML [DCQ-HTML] and XML [DC-XML-GUIDELINES, [DC-XML], is not negatively affected by these developments. The rules for interpreting metadata in these syntaxes in terms of the DCAM are simpler than for RDF, as these other syntaxes are not bound by the semantics of RDF.
The declaration of domains and ranges helps clarify the formal semantics of DCMI properties. Metadata creators need to use syntactic constructs to ensure that RDF-consuming applications correctly interpret any value strings. The generation of Dublin Core™ metadata in RDF becomes slightly more complex for anyone producing metadata by hand. However, these measures eliminate the current ambiguity, enabling metadata that is mappable more consistently to the DCAM. Support by tools is improved by the machine-processable restrictions. In order to process legacy metadata, metadata consumers might need to "special-case" any metadata containing value strings associated directly with the affected Dublin Core™ properties (i.e., without intervening non-literal nodes).
The January 2008 Recommendation differs from the legacy RDF specifications in its handling of value strings.
Support for multiple value strings
The DCAM specifies that each value can be represented in a DCAM statement by multiple value strings. The new RDF expression supports this construct, using the
rdf:valueproperty``. This allows value strings in different languages or using different syntax encoding schemes to be used as representations of a single value.
Deprecated use of
Value strings are now expressed using
rdf:value. The use of
rdfs:label for expressing value strings is no longer supported, as its definition does not clearly fit this purpose. Of course, the use of those properties is not forbidden, but these properties are not considered to have any special interpretation in terms of the DCAM.
Support for RDF datatypes
RDF datatypes can now be used with value strings, corresponding to the DCAM concept of Syntax Encoding Schemes.
For value strings occurring as the object of a
rdf:valueproperty, this is a simple matter.
The new specification also allows the use of datatyped or plain literals as direct values of properties when the value is a literal.
The January 2008 Recommendation deprecates several constructs described in the May 2002 specification [DCQ-RDF-XML].
Deprecated use of RDF Containers
The RDF Container constructs
rdf:Seq are no longer provided as an alternative for constructing ordered and unordered sets. They have no correspondence in the DCAM, and except in the case when the range of a property includes one of these classes, they should no longer be used.
Deprecated construct "poor-man's structured values"
The recursive use of
rdf:value for structured values has been deprecated. It has no correspondance in the DCAM and does not lend itself very well to automated processing. The use of this construct is therefore no longer supported.
Deprecated construct "poor-man's language qualification"
The use of "poor-man's language qualification" in the 2002 specification does not fit the DCAM and does not take into account the language tagging of plain literals in RDF. It is no longer supported.
Removal of references to "dumb-down"
The dumb-down algorithm is independent of any particular expression of Dublin Core™ metadata (such as Dublin Core™ metadata in RDF) and is therefore out of scope for this specification. References to dumb-down have therefore been removed.
Removal of reification from the January 2008 Recommendation
The use of reification is now considered to fall outside the scope of the specification and is therefore no longer part of the January 2008 Recommendation. As it does not interfere with the metadata itself, however, reification can still be used in accordance with RDF specifications.
Removal of RDF schemas from the Working Draft
The RDF schemas for DCMI properties and classes are part of the definitions of these terms and do not belong specifically to the RDF expression of Dublin Core™ metadata. They have been removed from the draft specification itself and can be accessed at http://dublincore.org/schemas/rdfs/. Human readable documentation matching the RDF Schemas is available at [DCMI-TERMS].
- DCMI Abstract Model
- Expressing Simple Dublin Core™ in RDF/XML
- Expressing Qualified Dublin Core™ in RDF/XML
- Expressing Dublin Core™ in HTML/XHTML meta and link elements
- DCMI Architecture Working Group
- DCMI Architecture Working Group mailing list
- Expressing Dublin Core™ metadata using the Resource Description
- Expressing Dublin Core™ metadata using XML
- Guidelines for implementing Dublin Core™ in XML
- Domains and Ranges for DCMI Properties
- DCMI Metadata Terms
- Revisions to DCMI Metadata Terms