DCMI Kernel Metadata Community

Draft Kernel Metadata Specification (August 2004)

Kernel metadata is designed to support orderly collection management, a prerequisite to higher-order services such as electronic permanence and searching. It is complemented by an open set of terms that can be freely proposed; see http://dublincore.org/groups/kernel/propose_term.html .

While it may also be useful for discovery in the Dublin Core™ metadata sense, kernel metadata aims to provide the data necessary for low-level operations such as generating simple (if rough) descriptions from a list of object identifiers and creating collection subset views for limited surveying and troubleshooting purposes.

Kernel metadata is meant to balance the needs for expressive power, very simple machine processing, and direct human manipulation. For predictability there are exactly four required kernel elements, but for flexibility any number of non-kernel elements may follow them. We use the ERC (Electronic Resource Citation) format in this document for simplicity of exposition, although we could easily express things equivalently in an XML-based format. Here's an example.

   erc:
   who: Lederberg, Joshua
   what: Studies of Human Families for Genetic Linkage
   when: 1974
   where: http://profiles.nlm.nih.gov/BB/AA/TT/tt.pdf
   note: This is an example of a
           small descriptive record.

The ERC is a sequence of elements ending in a blank line. An element consists of a label, a colon, and an optional value. A long value may be folded (continued) onto the next line by inserting a newline and indenting the next line. A value can be thus folded across multiple lines. An element value folded across several lines is treated as if the lines were joined together on one long line. For example, the ``note'' element from the example is considered the same as

   note: This is an example of a small descriptive record.

For annotation purposes, any line beginning with a '#' (hash) character is treated as if it were not present (in programmer terms, a comment line). That's the basic ERC record syntax.

The kernel element names mostly map one-to-one with the Dublin Core, however who'' maps from the Creator, Contributor, and Publisher elements in a manner consistent with the important Dublin Core™ minority view that would have them collapsed into one agent'' element. The general reason for using different names from Dublin Core™ was to reflect more stringent kernel value rules. Having said that, kernel metadata semantics still rely heavily on the wording of the NISO/ANSI Z39.85 Dublin Core™ metadata standard.

Value Rules

An element value may contain multiple values, each separated from the next by a `|' (pipe) character. Each value may contain free text, but special assumptions apply to values that begin with any of the following conventions.

An initial [:] signals that the value is a 4-digit date (yyyy), an 8-digit date (yyyymmdd), or a 14-digit time (yyyymmddHHMMSS), or a comma-separated list of dates and ranges. A range is a period of time specified by a start date, a hyphen, and an end date, where either date, but not both, may be missing. Optional whitespace may be inserted between any digits for readability.

An initial , (comma) signals a sort-friendly element, such as one containing a person's name in the form Familyname, Givenname, or containing the words in a document title with the initial stopwords removed or rotated to the end of the element.

An initial (:value) signals a controlled vocabulary term value. This is especially important for the different flavors of ``missing'', when a value cannot otherwise be supplied for a required element. Everything after the (:value) is considered to be a free text equivalent.

Because these conventions were absent from the example ERC above, no special assumptions could be inferred. A contrived example that does allows all the assumptions is

   erc:
   who: , Smith, Jill
   what: Cocktail Napkin Drawing #2
   when: [:] 1969 04 01 213000
   where: (:unav) destroyed during spill of 1969 04 01 213500

The four main kernel element labels are special in that they are required and can be re-used with different meanings in different contexts. The primary context is the story of an expression of an object, such as the publication of a written work. This matches a typical bibliographic citation.

 who (erc) a responsible person or party 
 what (erc) a name or other human-oriented identifier 
 when (erc) a date important in the object's lifecycle 
 where (erc) a location or system-oriented identifier

Another context is the story of an object's content. This is the ``erc-about'' context.

 who (erc-about) a person or party figuring in the information content 
 what (erc-about) a subject or topic figuring in the information content 
 when (erc-about) a time period covered by the information content 
 where (erc-about) a location or region covered by the information content

Another context is the story of the origin of the metadata record itself. This is the ``erc-from'' context.

 who (erc-from) a person or party responsible for the record 
 what (erc-from) a short form of the identifier for the record 
 when (erc-from) the last modification date of the record 
 where (erc-from) a location of the fullest form of the record

Another context is the story of a support commitment made to an object. This is the ``erc-support'' context.

 who (erc-support) a person or party responsible for the object 
 what (erc-support) the short form of the commitment made to the object 
 when (erc-support) the last modification date of the commitment 
 where (erc-support) a location of the fullest form of the commitment

Kernel Glossary of Elements and Values

in
(t11) A structured element that references a serial publication by name, volume, issue, date, and issue URL in which the described object appears. DC Mapping: Relation

how
(h5, erc) An account of the content of the resource. Examples of ``how'' include, but are not limited to, an abstract, table of contents, free-text account of the content, or reference to a graphical representation of content. DC Mapping: Description

format
(h22, erc) The physical or digital manifestation of the resource. Typically, ``format'' will include the media-type or dimensions of the resource as opposed to its nature or genre. The ``format'' element may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary, such as, the list of Internet Media Types (MIME) defining computer media formats. DC Mapping: Format

note
(t11) A free text note about the record. DC Mapping: none

what
(h2, erc) A human-oriented name given to the resource, or what this expression of the resource was called. Typically, ``what'' will be a name by which the resource is formally known. Compared to the ``where'' element, which is also a kind of name, the ``what'' element is suitable for human consumption. DC Mapping: Title

(h12, erc-about) A subject or topic figuring in the information content.

when
(h3, erc) A date of an important event in the lifecycle of the resource, often when it was expressed. Typically, ``when'' will be associated with the creation or availability of the resource. DC Mapping: Date

(h13, erc-about) A time period covered by the information content

where
(h4, erc) An access-oriented name given to the resource, or where this resource was expressed. Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Formal identification systems include but are not limited to URL, ARK, DOI, and ISBN. DC Mapping: Identifier

(h14, erc-about) A location or region covered by the information content.

who
(h1, erc) An entity responsible for creating or making available the content of the resource, in other words, who expressed the resource. Examples of ``who'' include a person, an organization, or a service. DC Mapping: Creator, but if no Creator use Publisher, and if no Publisher, use Contributor.

(h14, erc-about) A person or party figuring in the information content.

:unkn
A null element term explaining that the value is unknown. Compared to :unav, this explanation carries a high degree of authority regarding the object described. Anonymous authorship is an example.

:unav
A null element term explaining that the value is unavailable indefinitely. Compared to :unkn, this explanation is intended for intermediary systems that know less about the object described and have to rely on the best metadata received.

:unac
A null element term explaining that the value is temporarily inaccessible. This might be due, for example, to a system outage.

:unap
A null element term explaining that the value is not applicable or makes no sense.

:unas
A null element term explaining that a value was never assigned. An untitled painting is an example.

:none
A null element term explaining that the element never had a value and never will.

:null
A null element term explaining that the value is explicitly empty.

:unal
A null element term explaining that the value is unallowed or suppressed intentionally.

:tba
A null element term explaining that the value is to be assigned or announced later.

:etal
A null element term explaining that the value is a stand-in for other values too numerous to list (et alia).