innovation in metadata design, implementation & best practices

Topic: Comment for dc:language
Identifier: http://dublincore.org/usage/meetings/2005/05/washdc/topic-language-comment/
See also: http://dublincore.org/usage/meetings/2005/05/washdc/
Created: 2005-05-11
Modified: 2005-05-16 17:28, Monday
Maintainer: Tom Baker

In Washington, we should vote on the change proposed below
by Rebecca.

The problem

Martin Dürst <duerst@w3.org> has pointed out that the comment
for the element "Language" currently says:

    Recommended best practice is to use RFC 3066 [RFC3066]
    which, in conjunction with ISO639 [ISO639]), defines two-
    and three-letter primary language tags with optional
    subtags. Examples include "en" or "eng" for English,
    "akk" for Akkadian", and "en-GB" for English used in the
    United Kingdom.

He recommends that this be fixed on the grounds that "eng"
is not valid in RFC 3066. He says that RFC 3066 clearly says
that if there is a two-letter and a three-letter code for a
language, the two-letter code MUST be used.

The relevant passage in RFC 3066 (http: //www.ietf.org/rfc/rfc3066.txt)
is point 2 under section 2.3:

     2.3 Choice of language tag
     
        One may occasionally be faced with several possible tags for the same
        body of text.
     
        Interoperability is best served if all users send the same tag, and
        use the same tag for the same language for all documents. If an
        application has requirements that make the rules here inapplicable,
        the application protocol specification MUST specify how the procedure
        varies from the one given here.
     
        The text below is based on the set of tags known to the tagging
        entity.
     
        1. Use the most precise tagging known to the sender that can be
           ascertained and is useful within the application context.
     
        2. When a language has both an ISO 639-1 2-character code and an ISO
           639-2 3-character code, you MUST use the tag derived from the ISO
           639-1 2-character code.

The Proposal

Rebecca proposes that we just give an example of a language
that has a 3-character code and no 2-character code. So we
could change the comment to read:

    Recommended best practice is to use RFC 3066 .... Examples
    include "en" for English or "ban" for Balinese.