Notes on DCMI specifications for expressing Dublin Core™ metadata in RDF
Notes on DCMI specifications for Dublin Core™ metadata in RDF
KMR Group, NADA, KTH (Royal Institute of Technology), Sweden
|Description of Document:||This document serves as a guide to implementers to the changes introduced with the 2006-05-29 Working Draft "Expressing Dublin Core™ metadata using the Resource Description Framework (RDF)" (and subsequent revisions) with respect to legacy DCMI specifications.|
Table of contents
- Support for domains and ranges
- Support for value strings
- Deprecated constructs
- Other changes
In April 2007, DCMI released for public comment the Working Draft "Expressing Dublin Core™ metadata in the Resource Description Framework (RDF)" [DC-RDF]. Subject to public review and discussion in the context of DCMI process, the April 2007 Working Draft is intended eventually to replace two legacy DCMI documents:
- Expressing Simple Dublin Core™ in RDF/XML [DCMES-XML], a DCMI Recommendation from July 2002;
- Expressing Qualified Dublin Core™ in RDF / XML [DCQ-RDFXML], a DCMI Proposed Recommendation from May 2002.
This document provides a guide to the changes introduced with the Working Drafts of May 2006 and April 2007. DCMI is seeking comments from communities affected by these differences. The content of any future DCMI Recommendation based on the April 2007 Working Draft will depend on feedback received from these communities.
Since 1997, the "Dublin Core™ data model" has evolved alongside W3C's Resource Description Framework (RDF). This process has resulted in the DCMI Abstract Model[ABSTRACT-MODEL], which was published in March 2005 as a DCMI Recommendation. The DCMI Abstract Model now provides a reference model on the basis of which particular Dublin Core™ expressions can be defined.
Since the publication of the DCAM, the Architecture Working Group has been preparing a new expression of Dublin Core™ in RDF. In March 2006, Mikael Nilsson (Royal Institute of Technology, Sweden) finalized the existing draft for a DCMI Public Comment period in June 2006 [COMMENT2006]. On the basis of feedback received in the Public Comment period and on the basis of an updated version of the DCMI Abstract Model, an updated version of the new RDF expression has now been prepared for Public Comment in April 2007.
The new specification represents a significant step in the evolution of the Dublin Core™ RDF expressions compared to the two earlier specifications. Historically, Dublin Core™ metadata expressed in RDF has suffered from a number of problems, including:
the existence of two legacy expressions of Dublin Core™ metadata in RDF [DCMES-XML, DCQ-RDF-XML];
conflicting recommendations in the above documents regarding the use of literal strings in Dublin Core™ metadata;
implementation complexity due to the use of idiosyncrasic constructs;
difficulties in providing formal ranges and domains for Dublin Core™ properties.
The legacy RDF expression specifictions, which predate the DCAM, contain constructs that are incompatible with concepts in the DCAM.
The April 2007 specification addresses these problems in ways described below.
The most significant change introduced by the April 2007 Working Draft is the addition of support for domains and ranges of properties in general, and of DCMI-defined properties in particular. DCMI metadata terms have hitherto been defined exclusively in natural language; the RDF expression of the DCMI term set (e.g., http://dublincore.org/2003/03/24/dces) served essentially to convey these English-language definitions in a form ingestable by RDF applications. As part of the process of clarifying the RDF expression for Dublin Core™ metadata, it has become evident that DCMI would benefit from supplementing these English-language definitions with machine-understandable declarations of domains and ranges. Such additional, machine-understandable precision is necessary as Dublin Core™ is deployed in the context of inference engines and ontology-based solutions. As of the time of writing, the DCMI Usage Board is considering the assignment of formal domains and ranges which make explicit the meanings intended in natural-language definitions [DOMAINS].
Literal values of properties without Literal ranges
For most DCMI metadata terms, the process of clarifying domains and ranges machine-understandably is straightforward and unambiguous. However, one problem with regard to legacy metadata usage is serious enough to bear closer scrutiny.
The Dublin Core™ community has long distinguished between Simple and Qualified Dublin Core™ -- a distinction reflected in the difference between the specifications "Expressing Simple Dublin Core™ in RDF/XML" [DCMES-XML] and the "Expressing Qualified Dublin Core™ in RDF/XML" [DCQ-RDF-XML]. The two legacy specifications differ with regard to whether properties such as dc:creator and dc:date have values that are non-literal resources (e.g., a Person or a Date, seen as entities), or literals representing the resources (i.e., a value string). In "Expressing Simple Dublin Core™ in RDF/XML", a dc:creator is a name:
<http://www.example.com> dc:creator "John Smith".
In "Expressing Qualified Dublin Core™ in RDF/XML", in contrast, a dc:creator is an entity, as in:
<http://www.example.com> dc:creator <http://www.example.org/person32>
<http://www.example.com> dc:creator _:xxx .
_:xxx rdf:type foaf:Person .
_:xxx rdf:value "John Smith" .
The new RDF encoding specification supports both of these constructs - but bases the choice of one form over the other on the range of a property. A property with a "literal" range will follow the former pattern, while a property with a "non-literal" range will follow the latter.
In accordance with this approach, the DCMI Usage Board is considering the assignment of appropriate ranges to the DCMI properties. A range of "Agent" will be given to dcterms:creator and dcterms:contributor, where "Agent" would be defined as "the class of all things that are a Person, Organization, or Service". Similarly, appropriate ranges would be specified for the other DCMI terms. The range "Literal" would apply only to metadata terms which are typically associated with value strings, such as dcterms:title or dcterms:identifier.
Using this approach, dcterms:creator refers to an entity which can be identified (e.g., in an authority file) and described in its own right (e.g., with a name, an affiliation, and a birth date). The English-language definitions of these terms bear out this interpretation: dcterms:creator is "an entity primarily responsible for making the content of the resource", examples being "a person, an organization, or a service". However, the usage comments associated with these definitions also reflect the ambiguity: "Typically, the name of a Creator should be used to indicate the entity".
In most cases, the appropriate range of a term has become reasonably obvious through a decade of implementation practice. In some cases, such as dc:creator and dc:contributor, however, that usage has been ambiguous, so the assignment of any specific range would make one or another part of the legacy metadata appear invalid in the context of machine processing. Declaring "Agent" as the range of dc:creator would mean that inferencing applications would expect to treat the value of the dc:creator property as a non-literal entity. Where legacy metadata represents names as literal values for dc:creator, applications would need to treat these as "special cases" in order to merge them with metadata in which those names were associated with the expected non-literal entity constructs. The legacy specifications did not properly address these ambiguities, with the result that an unknown amount of Dublin Core-based RDF data is inconsistent with the intended semantics of the Dublin Core™ properties.
The clarification of these ambiguities through the assignment of domains and ranges is currently considered to be a desirable step towards ensuring the long-term viability of Dublin Core™ in RDF. However, one important compromise has been reached: domains and ranges will only be asserted for properties in the http://purl.org/dc/terms/ namespace, including copies of the properties in the http://purl.org/dc/elements/1.1/ namespace. Thus, dc:creator will still have an unspecified range and can be used with both literal and non-literal values, while dcterms:creator will have a (non-literal) range of Agent.
Impact on legacy Dublin Core™ metadata
The declaration of domains and ranges for DCMI properties has important implications for the interpretation of legacy Dublin Core™ metadata in RDF. However, the interpretation of Dublin Core™ metadata in other formats, such as HTML [DCQ-HTML] and XML [DC-XML-GUIDELINES, [DC-XML], would not be negatively affected by these developments. The rules for interpreting metadata in these syntaxes in terms of the DCAM are simpler than for RDF, as these other syntaxes are not bound by the semantics of RDF.
The declaration of domains and ranges would help clarify the formal semantics of DCMI properties. Metadata creators would need to use syntactic constructs to ensure that RDF-consuming applications correctly interpret any value strings. The generation of Dublin Core™ metadata in RDF would become slightly more complex for anyone producing metadata by hand. However, these measures would eliminate the current ambiguity, enabling metadata that is mappable more consistently to the DCAM. Support by tools would be improved by the machine-processable restrictions. In order to process legacy metadata, metadata consumers might need to "special-case" any metadata containing value strings associated directly with the affected Dublin Core™ properties (i.e., without intervening non-literal nodes).
The April 2007 Working Draft differs from the legacy RDF specifications in its handling of value strings.
Support for multiple value strings
The DCAM specifies that each value can be represented in a DCAM statement by multiple value strings. The new RDF expression supports this construct, using the
rdf:valueproperty``. This allows value strings in different languages or using different syntax encoding schemes to be used as representations of a single value.
Deprecated use of rdfs:label
Value strings are now expressed usingrdf:value. The use of rdfs:label for expressing value strings is no longer supported, as its definition does not clearly fit this purpose. Of course, the use of those properties is not forbidden, but these properties are not considered to have any special interpretation in terms of the DCAM.
Support for RDF datatypes
RDF datatypes can now be used with value strings, corresponding to the DCAM concept of Syntax Encoding Schemes.
For value strings occurring as the object of a
rdf:valueproperty, this is a simple matter.
The new specification also allows the use of datatyped or plain literals as direct values of properties when value is a literal
The April 2007 deprecates several constructs described in the May 2002 specification [DCQ-RDF-XML].
Deprecated use of RDF Containers
The RDF Container constructs rdf:Bag,rdf:Alt and rdf:Seq are no longer provided as an alternative for constructing ordered and unordered sets. They have no correspondence in the DCAM, and except in the case when the range of a property includes one of these classes, they should no longer be used.
Deprecated construct "poor-man's structured values"
The recursive use of rdf:value for structured values has been deprecated. It has no correspondance in the DCAM and does not lend itself very well to automated processing. The use of this construct is therefore no longer supported.
Deprecated construct "poor-man's language qualification"
The use of "poor-man's language qualification" in the 2002 specification does not fit the DCAM and does not take into account the language tagging of plain literals in RDF. It is no longer supported.
Removal of references to "dumb-down"
The dumb-down algorithm is independent of any particular expression of Dublin Core™ metadata (such as Dublin Core™ metadata in RDF) and is therefore out of scope for this specification. References to dumb-down have therefore been removed.
Removal of reification from the Working Draft
The use of reification is now considered to fall outside the scope of the specification and is therefore no longer part of the April 2007 Working Draft. As it does not interfere with the metadata itself, however, reification can still be used in accordance with RDF specifications.
Removal of RDF schemas from the Working Draft
The RDF schemas for DCMI properties and classes are part of the definitions of these terms and do not belong specifically to the RDF expression of Dublin Core™ metadata. They have been removed from the draft specification itself and can be accessed at http://dublincore.org/schemas/rdfs/.
- DCMI Abstract Model
- Report from public comment period on Dublin Core™ RDF expression
<http://dublincore.org/architecturewiki/DCRDFTaskforce/PublicCommentJune2006 - archived as
- Expressing Simple Dublin Core™ in RDF/XML
- Expressing Qualified Dublin Core™ in RDF/XML
- Expressing Dublin Core™ in HTML/XHTML meta and link elements
- DCMI Architecture Working Group
- DCMI Architecture Working Group mailing list
- Expressing Dublin Core™ metadata using the Resource Description
- Expressing Dublin Core™ metadata using XML
- Guidelines for implementing Dublin Core™ in XML
- DC property domains and ranges