innovation in metadata design, implementation & best practice

Dublin Core Tags Applied to XML-Data Schemas for Purpose of Description and Categorization

Creators: Andrew Layman
Date Issued: 1999-05-20
This Version: http://dublincore.org/specifications/dublin-core/1999/05/20/dc-xml-data-schemas/
Latest Version: https://www.dublincore.org/specifications/dublin-core/dc-xml-data-schemas/
Replaces:
  • https://www.dublincore.org/specifications/dublin-core/dc-xml-data-schemas/1999-05-20/
  • Status: note
    Description: The Dublin Core Metadata Element Set is a collection of fifteen elements designed by librarians to categorize and catalog documents. The elements are sufficiently general that they are suitable for categorizing and describing XML-Data schemas. This paper proposes a schema, based on Dublin Core elements, and then gives guidelines for its application in XML-Data schemas.

    The Dublin Core Metadata Element Set is a collection of fifteen elements designed by librarians to categorize and catalog documents. The elements are sufficiently general that they are suitable for categorizing and describing XML-Data schemas. This paper proposes a schema, based on Dublin Core elements, and then gives guidelines for its application in XML-Data schemas.

    But first, a sample: A trivial schema categorized according to the elements described here might look like:

    <Schema xmlns='urn:schemas-microsoft-com:xml-data'
              xmlns:dt = 'urn:schemas-microsoft-com:datatypes'
              >
    
      <catalogInformation xmlns= urn:electrocommerce-org:schemas/electrocommerce' >
          <title>My trivial schema</title>
          <creator>
              <FreeText>Andrew Layman</FreeText>
              <personReference>mailto:andrewl@microsoft.com</personReference>
          </creator>
          <subject>
              <subjectRef>urn:electrocommerce-org/taxonomy/teapot</subjectRef>
              <keyword>teapot</keyword>
          </subject>
          <subject>
              <subjectRef>urn:electrocommerce-org/taxonomy/coffee</subjectRef>
              <keyword>coffee</keyword>
              <keyword xml:lang="it">caffe</keyword>
          </subject>
      </catalogInformation>
    </Schema>
    

    The Schema
    This defines a small set of tags, each based on the corresponding generic Dublin Core element shown in Appendix A, but specialized for the purpose of cataloging schemas.

    <!-- Schema for Schema Catalog, version 1, based on Dublin Core,
    generated 5/13/99 by AJL. -->
     
    <Schema xmlns='urn:schemas-microsoft-com:xml-data'
              xmlns:dt = 'urn:schemas-microsoft-com:datatypes'
          >
    
      <description>This defines a small set of tags,
      each based on the corresponding generic Dublin Core,
      but here specialized for the purpose of cataloging schemas.
      See http://purl.org/dc for more information on Dublin Core.
      </description>
    
      <ElementType name="personReference" model="closed" content="textOnly" >
        <datatype dt:type="URI" />
      </ElementType>
    
      <ElementType name="subjectReference" model="closed" content="textOnly" >
        <datatype dt:type="URI" />
      </ElementType>
    
      <ElementType name="resourceReference" model="closed" content="textOnly" >
        <datatype dt:type="URI" />
      </ElementType>
    
      <ElementType name="FreeText" model="open" content="mixed" >
        <description>Mixed text and markup. Must be well-formed
        if marked-up.</description>
        <attribute type="xml:lang" />
      </ElementType>
    
      <ElementType name="keyword" model="closed" content="textOnly" >
        <description>A keyword used for categorization, with a human-language
        meaning but not drawn from a controlled vocabulary identified by a URI.
        We recommend using only lower-case text.</description>
        <attribute type="xml:lang" />
      </ElementType>
    
      <ElementType name="title" model="closed" content="textOnly" >
        <description>The descriptive title of this schema.</description>
        <attribute type="xml:lang" />
      </ElementType>
    
      <ElementType name="creator" model="open" content="eltOnly" >
        <description>The person or organization primarily responsible for
        creating the intellectual content of this schema. </description>
        <group order="one" minOccurs="1" maxOccurs="*">
            <element type="personReference" />
            <element type="FreeText" />
        </group>
      </ElementType>
    
      <ElementType name="subject" model="open" content="eltOnly" >
        <description>The topic of the schema. Typically, subject will be
        expressed as keywords or phrases that describe the subject or
        content of the schema. The use of controlled vocabularies and
        formal classification schemes is encouraged.</description>
        <group order="one" minOccurs="0" maxOccurs="*">
            <element type="subjectReference" />
            <element type="keyword" />
        </group>
      </ElementType>
    
      <ElementType name="description" model="open" content="mixed" >
        
        <description> A textual description of the content of the resource,
        including abstracts in the case of document-like objects or content
        descriptions in the case of visual resources.</description>
        <attribute type="xml:lang" />
        
      </ElementType>
    
      <ElementType name="publisher" model="open" content="eltOnly" >
        
        <description>The entity responsible for making the resource
        available in its present form, such as a publishing house, a
        university department, or a corporate entity.</description>
    
        <group order="one" minOccurs="1" maxOccurs="*">
            <element type="personReference" />
            <element type="FreeText" />
        </group>
        
      </ElementType>
    
      <ElementType name="contributor" model="open" content="eltOnly" >
        
        <description>A person or organization not specified in a Creator
        element who has made significant intellectual contributions to the
        resource but whose contribution is secondary to any person or
        organization specified in a Creator element (for example, editor,
        transcriber, and illustrator).</description>
    
        <group order="one" minOccurs="1" maxOccurs="*">
            <element type="personReference" />
            <element type="FreeText" />
        </group>
        
      </ElementType>
    
      <ElementType name="identifier" model="open" content="eltOnly" >
        
        <description>A string or number used to uniquely identify the
        resource. Examples for networked resources include URLs and URNs
        (when implemented). Other globally-unique identifiers, such as
        International Standard Book Numbers (ISBN) or other formal names are
        also candidates for this element.</description>
    
        <group order="one" minOccurs="1" maxOccurs="*">
            <element type="resourceReference" />
            <element type="FreeText" />
        </group>
    
    </ElementType>
    
    <ElementType name="source" model="open" content="eltOnly" >
        
        <description>Information about a second resource from which the
        present resource is derived. While it is generally recommended that
        elements contain information about the present resource only, this
        element may contain a date, creator, format, identifier, or other
        metadata for the second resource when it is considered important for
        discovery of the present resource; recommended best practice is to
        use the Relation element instead. For example, it is possible to
        use a Source date of 1603 in a description of a 1996 film adaptation
        of a Shakespearean play, but it is preferred instead to use Relation
        "IsBasedOn" with a reference to a separate resource whose
        description contains a Date of 1603. Source is not applicable if the
        present resource is in its original form.</description>
    
        <group order="one" minOccurs="1" maxOccurs="*">
            <element type="resourceReference" />
            <element type="FreeText" />
        </group>
    
        
      </ElementType>
    
      <ElementType name="language" model="closed" content="textOnly" >
        
        <description>The language of the intellectual content of the
        resource. When used, he content of this field must coincide
        with RFC 1766 [Tags for the Identification of Languages,
        http://ds.internic.net/rfc/rfc1766.txt ]; examples include en, de,
        es, fi, fr, ja, th, and zh.</description>
        
      </ElementType>
    
      <ElementType name="rights" model="open" content="eltOnly" >
        
        <description>A rights management statement, an identifier that
        links to a rights management statement, or an identifier that links
        to a service providing information about rights management for the
        resource.</description>
    
        <group order="one" minOccurs="1" maxOccurs="*">
            <element type="resourceReference" />
            <element type="FreeText" />
        </group>
        
      </ElementType>
    
      <ElementType name="catalogInformation" model="open" content="eltOnly" >
        
        <description>
    
        A small set of tags, each based on the corresponding generic Dublin Core element,
        but here specialized for the purpose of cataloging schemas. See
        http://purl.org/dc for more information on Dublin Core.
        Many tags may be repeated at this level, and also allow multiple occurences of
        their subelments. The intended usage is that distinct items (for example distinct
        creators) should be expressed with separate elements, while alternative forms of
        reference to the same item (for example, several ways of referring to the same
        creator) should be expressed as alternate subelements.
    
        </description>
    
        <group order="seq">
            <element type="title" minOccurs="0" maxOccurs="*" />
            <element type="creator" minOccurs="0" maxOccurs="*" />
            <element type="subject" minOccurs="0" maxOccurs="*" />
            <element type="description" minOccurs="0" maxOccurs="*" />
            <element type="publisher" minOccurs="0" maxOccurs="*" />
            <element type="contributor" minOccurs="0" maxOccurs="*" />
            <element type="identifier" minOccurs="0" maxOccurs="*" />
            <element type="source" minOccurs="0" maxOccurs="*" />
            <element type="language" minOccurs="0" maxOccurs="*" />
            <element type="rights" minOccurs="0" maxOccurs="*" />
        </group>
        
      </ElementType>
    
    </Schema>
    

    How to Use the Schema
    Crucial to understanding how this is used is to first understand the role of the several URI-based references, such as personReference, subjectReference and resourceReference. These occur within elements whose content model is very flexible in Dublin Core. For example, in DC the creator element may have free text or it may have a reference to a specific company or individual via some well-known identification system. Controlled sets of names, for example D-U-N-S numbers, make excellent identifiers. We pair these with the Universal Resource Identifier specification , and propose that companies and organizations that control identifiers should name their identifier sets with URIs, thereby allowing us to use the datatype 'URI' wherever a controlled identifier is needed.

    For example, supposing that Dun and Bradstreet gave a URI beginning with 'urn:www-dnb-com/dunsno' to every number they issue. A creator element might look like

    <creator>
        <personRef>urn:www-dnb-com/dunsno/123456789012345</personRef>
    </creator>
    
    

    Similarly, subject taxonomies are reasonably going to be defined by many authorities. Each of these should have a corresponding URI namespace, used similarly to

    <subject>
        <subjectRef>urn:electrocommerce-org/taxonomy/teapot</subjectRef>
    </subject>
    

    Subject categorizations also allow keywords from uncontrolled vocabularies, so the following might be seen:

    <subject>
        <subjectRef>urn:electrocommerce-org/taxonomy/teapot</subjectRef>
        <keyword>teapot</keyword>
    </subject>
    

    A trivial schema categorized according to the elements described here might look like:

    <Schema xmlns='urn:schemas-microsoft-com:xml-data'
              xmlns:dt = 'urn:schemas-microsoft-com:datatypes'
              >
    
      <catalogInformation xmlns= urn:electrocommerce-org:schemas/electrocommerce' >
          <title>My trivial schema</title>
          <creator>
              <FreeText>Andrew Layman</FreeText>
              <personReference>mailto:andrewl@microsoft.com</personReference>
          </creator>
          <subject>
              <subjectRef>urn:electrocommerce-org/taxonomy/teapot</subjectRef>
              <keyword>teapot</keyword>
          </subject>
          <subject>
              <subjectRef>urn:electrocommerce-org/taxonomy/coffee</subjectRef>
              <keyword>coffee</keyword>
              <keyword xml:lang="it">caffe</keyword>
          </subject>
      </catalogInformation>
    </Schema>
    

    Appendix A: The Generic Dublin Core Element Set
    This defines each of the fifteen elements in a completely unconstrained fashion. Every element can contain anything.

    <!-- Schema for Dublin Core, generated 5/13/99 4:03:15 PM by AJL. -->
     
    <Schema xmlns='urn:schemas-microsoft-com:xml-data'
              xmlns:dt = 'urn:schemas-microsoft-com:datatypes' >
    
      <description> The Dublin Core is a simple metadata element
            set intended to facilitate discovery of electronic
            resources. 
      </description>
    
      <ElementType name="Title" model="open" content="mixed" >
        
        <description>The name given to the resource, usually by the Creator
        or Publisher.</description>
        
      </ElementType>
    
      <ElementType name="Creator" model="open" content="mixed" >
        
        <description>The person or organization primarily responsible for
        creating the intellectual content of the resource. For example,
        authors in the case of written documents, artists, photographers, or
        illustrators in the case of visual resources.</description>
        
      </ElementType>
    
      <ElementType name="Subject" model="open" content="mixed" >
        
        <description>The topic of the resource. Typically, subject will be
        expressed as keywords or phrases that describe the subject or
        content of the resource. The use of controlled vocabularies and
        formal classification schemes is encouraged.</description>
        
      </ElementType>
    
      <ElementType name="Description" model="open" content="mixed" >
        
        <description> A textual description of the content of the resource,
        including abstracts in the case of document-like objects or content
        descriptions in the case of visual resources.</description>
        
      </ElementType>
    
      <ElementType name="Publisher" model="open" content="mixed" >
        
        <description>The entity responsible for making the resource
        available in its present form, such as a publishing house, a
        university department, or a corporate entity.</description>
        
      </ElementType>
    
      <ElementType name="Contributor" model="open" content="mixed" >
        
        <description>A person or organization not specified in a Creator
        element who has made significant intellectual contributions to the
        resource but whose contribution is secondary to any person or
        organization specified in a Creator element (for example, editor,
        transcriber, and illustrator).</description>
        
      </ElementType>
    
      <ElementType name="Date" model="open" content="mixed" >
        
        <description>A date associated with the creation or availability of
        the resource. Such a date is not to be confused with one belonging
        in the Coverage element, which would be associated with the resource
        only insofar as the intellectual content is somehow about that
        date. Recommended best practice is defined in a profile of ISO 8601
        [Date and Time Formats (based on ISO8601), W3C Technical Note,
        http://www.w3.org/TR/NOTE-datetime] that includes (among others)
        dates of the forms YYYY and YYYY-MM-DD. In this scheme, for example,
        the date 1994-11-05 corresponds to November 5, 1994.</description>
        
      </ElementType>
    
      <ElementType name="Type" model="open" content="mixed" >
        
        <description>The category of the resource, such as home page,
        novel, poem, working paper, technical report, essay, dictionary. For
        the sake of interoperability, Type should be selected from an
        enumerated list that is currently under development in the workshop
        series.</description>
        
      </ElementType>
    
      <ElementType name="Format" model="open" content="mixed" >
        
        <description>The data format of the resource, used to identify the
        software and possibly hardware that might be needed to display or
        operate the resource. For the sake of interoperability, Format
        should be selected from an enumerated list that is currently under
        development in the workshop series.</description>
        
      </ElementType>
    
      <ElementType name="Identifier" model="open" content="mixed" >
        
        <description>A string or number used to uniquely identify the
        resource. Examples for networked resources include URLs and URNs
        (when implemented). Other globally-unique identifiers, such as
        International Standard Book Numbers (ISBN) or other formal names are
        also candidates for this element.</description>
        
      </ElementType>
    
      <ElementType name="Source" model="open" content="mixed" >
        
        <description>Information about a second resource from which the
        present resource is derived. While it is generally recommended that
        elements contain information about the present resource only, this
        element may contain a date, creator, format, identifier, or other
        metadata for the second resource when it is considered important for
        discovery of the present resource; recommended best practice is to
        use the Relation element instead. For example, it is possible to
        use a Source date of 1603 in a description of a 1996 film adaptation
        of a Shakespearean play, but it is preferred instead to use Relation
        "IsBasedOn" with a reference to a separate resource whose
        description contains a Date of 1603. Source is not applicable if the
        present resource is in its original form.</description>
        
      </ElementType>
    
      <ElementType name="Language" model="open" content="mixed" >
        
        <description>The language of the intellectual content of the
        resource. Where practical, the content of this field should coincide
        with RFC 1766 [Tags for the Identification of Languages,
        http://ds.internic.net/rfc/rfc1766.txt ]; examples include en, de,
        es, fi, fr, ja, th, and zh.</description>
        
      </ElementType>
    
      <ElementType name="Relation" model="open" content="mixed" >
        
        <description>An identifier of a second resource and its
        relationship to the present resource. This element permits links
        between related resources and resource descriptions to be
        indicated. Examples include an edition of a work (IsVersionOf), a
        translation of a work (IsBasedOn), a chapter of a book (IsPartOf),
        and a mechanical transformation of a dataset into an image
        (IsFormatOf). For the sake of interoperability, relationships should
        be selected from an enumerated list that is currently under
        development in the workshop series.</description>
        
      </ElementType>
    
      <ElementType name="Coverage" model="open" content="mixed" >
        
        <description>The spatial or temporal characteristics of the
        intellectual content of the resource. Spatial coverage refers to a
        physical region (e.g., celestial sector); use coordinates (e.g.,
        longitude and latitude) or place names that are from a controlled
        list or are fully spelled out. Temporal coverage refers to what the
        resource is about rather than when it was created or made available
        (the latter belonging in the Date element); use the same date/time
        format (often a range) [Date and Time Formats (based on ISO8601),
        W3C Technical Note, http://www.w3.org/TR/NOTE-datetime] as
        recommended for the Date element or time periods that are from a
        controlled list or are fully spelled out.</description>
        
      </ElementType>
    
      <ElementType name="Rights" model="open" content="mixed" >
        
        <description>A rights management statement, an identifier that
        links to a rights management statement, or an identifier that links
        to a service providing information about rights management for the
        resource.</description>
        
      </ElementType>
      </Schema>