innovation in metadata design, implementation & best practices
DCMI Localization and Internationalization Community
Meeting Report: Working Group on Dublin Core in Multiple Languages
- German National Research Center for Information Technology (GMD), Bonn, Germany
- Date of meeting:
- Thomas Baker (Thomas.Baker@gmd.de), Chair/Rapporteur
- 1998-06-28 (final, revised version)
- Thomas Baker (Thomas.Baker@gmd.de), Chair/Rapporteur
Jose Borbinha (Jose.Borbinha@ip.pt)
Diann Rusch-Feja (firstname.lastname@example.org)
Reginald Ferber (email@example.com)
Sarantos Kapidakis (firstname.lastname@example.org)
Ulrich Thiel (email@example.com)
Anne-Marie Vercoustre (Anne-Marie.Vercoustre@inria.fr)
The Working Group on Dublin Core in Multiple Languages grew out of a break-out group at the Canberra workshop of March 1997 at which we agreed that versions of the Dublin Core in languages other than English should share a set of globally valid, machine-readable tokens. [DC4] Informal follow-up discussions were held at the Dublin Core workshop in Helsinki (October 1997), the International Symposium on Digital Libraries in Tsukuba, Japan (November 1997), the EU-NSF Working Group on Metadata in Washington DC (February 1998), and the Eleventh Digital Library Workshop in Tsukuba (March 1998). Tom Baker presented papers related to Dublin Core in multiple languages at the two meetings in Japan. [ISDL97]
At the March workshop, Eric Miller (head of the RDF Working Group), Shigeo Sugimoto, and Tom Baker discussed in detail how the emerging Resource Description Framework could be used to create a distributed registry of Dublin Core in multiple languages, starting with versions of Simple Dublin Core (the fifteen unqualified elements) in several languages. The Working Draft issued by the RDF Schema Working Group on 9 April 1998 [RDF] included the schema of Simple Dublin Core in English but left open key questions related to schemas of Dublin Core in multiple languages. An email exchange followed, in which members of the RDF Working Group, in particular Charles Wicksteed, helped define the technical decisions that needed to be made in order to move forward on this. These technical issues were the focus of discussion at the meeting in Bonn.
The participants at the Bonn meeting agreed on the following:
- The reference language of the international Dublin Core community is English inasmuch all of its outputs are discussed and approved in English. Accordingly, the English version of the Dublin Core has a special status as the canonical result of an international process.
- However, the semantics of Dublin Core elements are in principle expressible equally well in any modern language. We hesitate to call versions of the Dublin Core in languages other than English "translations." Rather, these versions could function as reference definitions in their own right, as objects of discussion, adaptation, and extension in locally specific ways. Like any good translation, moreover, such versions need not be word-for-word literal as long as they convey the essence of the canonical definitions in ways that make sense to their readers.
- People currently associate "the Dublin Core" with a standard in English. We would like to call the sum of instantiations of the Dublin Core in various languages "Multilingual Dublin Core," or "DC-Multilingual." One might think of the metadata semantics shared within DC-Multilingual to be, in some sense, independent of any particular language, hence universal.
- The versions of Dublin Core elements in various languages should share a single namespace. As of May 1998, one likely candidate for the namespace name is "http://purl.org/metadata/dublin_core_elements." This will need to be discussed and ratified by the Dublin Core community as a whole.
- As defined in the RFC for Simple Dublin Core [RFC], Dublin Core elements consist of a descriptive name (eg, "Author or Creator"), a single-word label or "token" for use in encoding schemes (eg, "Creator"), and element definitions. The descriptive names and element definitions are meant to be read primarily by humans, the tokens primarily by machines. The tokens look like English words but stand for universal elements. Universal elements can have interchangeable names and definitions in multiple languages.
- A version of the Dublin Core in French (DC-French) at http://www.inria.fr could have universal elements -- canonical elements with translated definitions and names -- and elements that are specific to DC-French. Universal elements would use the Dublin Core namespace (here, http://purl.org/metadata/dublin_core_elements), while local elements would use a namespace such as http://www.inria.fr/metadata/dublin_core_elements. In the RDF schema of DC-French at http://www.inria.fr, then, the definition of the Title element would look like this:
<RDF:DescriptionRDF:href="http://purl.org/metadata/dublin_core_elements#Title"><RDF:instanceOfRDF:href="http://www.w3.org/TR/WD-RDF-Syntax#PropertyType"/><RDFS:necessityRDF:href="http://www.w3.org/TR/WD-RDF-Schema#ZeroOrMore"/><RDFS:Comment xml:lang="fr">Le nom donne a la ressource par le createur ou l'auteur.</RDFS:Comment></RDF:Description>
- Dublin Core elements have a human-readable descriptive name, a machine-readable label (token), and human-readable definitions. However, as of May 1998, the Resource Description Framework Schema Core supports only the latter two -- the token and the definition. [RDF] For describing the Dublin Core in English, this does not pose a big problem, as the token is almost identical to the name (eg, "Creator" versus "Author or Creator"). To express schemas in other languages, however, we need support for human-readable descriptive names as well. If the RDF specification will not support this in the Schema Core, perhaps the Dublin Core community should define its own extension to RDF (a new "property type," in RDF jargon), which could be maintained within the official Dublin Core name space.
- A description of a document using Simple Dublin Core and RDF syntax would look like the following, independently of whether the creator of that description had referred to DC-English, DC-French, or DC-Thai, and regardless of the language of the document itself (in this case German):
<?xml:namespace ns="http://purl.org/metadata/dublin_core_elements" prefix="DC"?><RDF:RDF><RDF:Description RDF:HREF="http://www.biblio.de/buecher/kleist.html"><DC:Title XML:lang="de">Das Erdbeben in Chili</DC:Title><DC:Creator>Heinrich von Kleist</DC:Creator></RDF:Description></RDF:RDF>
- It is not clear whether machine-readable RDF schemas should be embedded in human-readable HTML documentation files or kept separate. For now we will keep these separate so that we more easily can change them when we know what is required.
- It is not clear whether the namespace name (see #4 above) should point to an RDF schema file or to a Web page in HTML. Perhaps element definitions could be dynamically extracted from an RDF file to an HTML file, or vice versa. Moreover, the draft XML Namespaces specification [XML] makes a distinction between a namespace name and an optional URI that points to a schema. This is clearly a question for further research. Either way, what the namespace name points to can change over time, so we can go ahead and write metadata with confidence.
- Versions of the Dublin Core in other languages should cite the specific English version on which they are based (eg, 1.0, 1.1, 2.0...), as that canonical version will evolve over time. For example, the Dublin Core could grow to have more than fifteen elements. Perhaps one could give each new version its own URI. Exactly how versioning should be implemented is a question for the Dublin Core community as a whole.
- Links from the central namespace server to versions in other languages should be machine-parsable. It should be possible for users to transmit a preference for Finnish, get the URI for DC-Finnish in return, and henceforth load Dublin Core element names and definitions from Helsinki.
- We are embarking on a period of experimentation, during which we should invite metadata-using institutions in many countries to create versions of Dublin Core in various languages. Regional language or institutional differences may occasionally result in the creation of multiple versions of Dublin Core in any given language, but we see no reason to place restrictions on this at present. Eventually we may need a process for evaluating such versions, with peer review for quality and some verification of an institution's commitment to maintaining a version in the long term. Versions that pass could be "certified" by the Dublin Core community as a whole.
- DC-Multilingual should be a forum for sharing and negotiating metadata semantics across languages. An extension originally defined in Thailand could be incorporated into the universal version and be used worldwide.
Process and deliverables
- The distributed registry should start with versions of Simple Dublin Core in multiple languages. Sub-elements should not be implemented before the Dublin Core community ratifies versions of Complex Dublin Core.
- DC-Multilingual should comply with the RDF specifications as they evolve. Indeed, it could provide RDF with a model application that is high-profile, international, and vendor-neutral. The DC-Multilingual community, in turn, should communicate its evolving requirements back to the RDF Working Group for consideration in future versions of RDF.
- The distributed registry of DC-Multilingual will be implemented by the Working Group on Dublin Core in Multiple Languages in close consultation with Dublin Core advisory committees, the RDF Working Group, and the Dublin Core community as a whole (as represented by the periodic workshops and by the meta2 mailing list). The Working Group will maintain a Web page at http://dublincore.org/ and conduct discussions on the mailing list available at http://www.jiscmail.ac.uk/lists/dc-international.html.
- Local implementors are strongly encouraged to develop projects that enhance or extend the capabilities of the registry. In addition to schemas, such sites could eventually hold downloadable templates, metadata editors, Java utilities, user guides, enumerated lists, crosswalks to other element sets, and controlled vocabularies.