DCMI Kernel Metadata Community
Draft Kernel Metadata Specification (August 2004)
Kernel metadata is designed to support orderly collection management, a prerequisite to higher-order services such as electronic permanence and searching. It is complemented by an open set of terms that can be freely proposed; see http://dublincore.org/groups/kernel/propose_term.html .
While it may also be useful for discovery in the Dublin Core™ metadata sense, kernel metadata aims to provide the data necessary for low-level operations such as generating simple (if rough) descriptions from a list of object identifiers and creating collection subset views for limited surveying and troubleshooting purposes.
Kernel metadata is meant to balance the needs for expressive power, very simple machine processing, and direct human manipulation. For predictability there are exactly four required kernel elements, but for flexibility any number of non-kernel elements may follow them. We use the ERC (Electronic Resource Citation) format in this document for simplicity of exposition, although we could easily express things equivalently in an XML-based format. Here's an example.
erc: who: Lederberg, Joshua what: Studies of Human Families for Genetic Linkage when: 1974 where: http://profiles.nlm.nih.gov/BB/AA/TT/tt.pdf note: This is an example of a small descriptive record.
The ERC is a sequence of elements ending in a blank line. An element consists of a label, a colon, and an optional value. A long value may be folded (continued) onto the next line by inserting a newline and indenting the next line. A value can be thus folded across multiple lines. An element value folded across several lines is treated as if the lines were joined together on one long line. For example, the ``note'' element from the example is considered the same as
note: This is an example of a small descriptive record.
For annotation purposes, any line beginning with a '#' (hash) character is treated as if it were not present (in programmer terms, a comment line). That's the basic ERC record syntax.
The kernel element names mostly map one-to-one with the Dublin Core, however who'' maps from the Creator, Contributor, and Publisher elements in a manner consistent with the important Dublin Core™ minority view that would have them collapsed into one
agent'' element. The general reason for using different names from Dublin Core™ was to reflect more stringent kernel value rules. Having said that, kernel metadata semantics still rely heavily on the wording of the NISO/ANSI Z39.85 Dublin Core™ metadata standard.
Value Rules
An element value may contain multiple values, each separated from the next by a `|' (pipe) character. Each value may contain free text, but special assumptions apply to values that begin with any of the following conventions.
An initial [:]
signals that the value is a 4-digit date (yyyy), an 8-digit date (yyyymmdd), or a 14-digit time (yyyymmddHHMMSS), or a comma-separated list of dates and ranges. A range is a period of time specified by a start date, a hyphen, and an end date, where either date, but not both, may be missing. Optional whitespace may be inserted between any digits for readability.
An initial ,
(comma) signals a sort-friendly element, such as one containing a person's name in the form Familyname, Givenname
, or containing the words in a document title with the initial stopwords removed or rotated to the end of the element.
An initial (:
value)
signals a controlled vocabulary term value. This is especially important for the different flavors of ``missing'', when a value cannot otherwise be supplied for a required element. Everything after the (:
value)
is considered to be a free text equivalent.
Because these conventions were absent from the example ERC above, no special assumptions could be inferred. A contrived example that does allows all the assumptions is
erc: who: , Smith, Jill what: Cocktail Napkin Drawing #2 when: [:] 1969 04 01 213000 where: (:unav) destroyed during spill of 1969 04 01 213500
The four main kernel element labels are special in that they are required and can be re-used with different meanings in different contexts. The primary context is the story of an expression of an object, such as the publication of a written work. This matches a typical bibliographic citation.
who (erc) a responsible person or party what (erc) a name or other human-oriented identifier when (erc) a date important in the object's lifecycle where (erc) a location or system-oriented identifier
Another context is the story of an object's content. This is the ``erc-about'' context.
who (erc-about) a person or party figuring in the information content what (erc-about) a subject or topic figuring in the information content when (erc-about) a time period covered by the information content where (erc-about) a location or region covered by the information content
Another context is the story of the origin of the metadata record itself. This is the ``erc-from'' context.
who (erc-from) a person or party responsible for the record what (erc-from) a short form of the identifier for the record when (erc-from) the last modification date of the record where (erc-from) a location of the fullest form of the record
Another context is the story of a support commitment made to an object. This is the ``erc-support'' context.
who (erc-support) a person or party responsible for the object what (erc-support) the short form of the commitment made to the object when (erc-support) the last modification date of the commitment where (erc-support) a location of the fullest form of the commitment
Kernel Glossary of Elements and Values
-
in
- (t11) A structured element that references a serial publication by name, volume, issue, date, and issue URL in which the described object appears. DC Mapping: Relation
-
how
- (h5, erc) An account of the content of the resource. Examples of ``how'' include, but are not limited to, an abstract, table of contents, free-text account of the content, or reference to a graphical representation of content. DC Mapping: Description
-
format
- (h22, erc) The physical or digital manifestation of the resource. Typically, ``format'' will include the media-type or dimensions of the resource as opposed to its nature or genre. The ``format'' element may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary, such as, the list of Internet Media Types (MIME) defining computer media formats. DC Mapping: Format
-
note
- (t11) A free text note about the record. DC Mapping: none
-
what
- (h2, erc) A human-oriented name given to the resource, or what this expression of the resource was called. Typically, ``what'' will be a name by which the resource is formally known. Compared to the ``where'' element, which is also a kind of name, the ``what'' element is suitable for human consumption. DC Mapping: Title
-
(h12, erc-about) A subject or topic figuring in the information content.
-
when
- (h3, erc) A date of an important event in the lifecycle of the resource, often when it was expressed. Typically, ``when'' will be associated with the creation or availability of the resource. DC Mapping: Date
-
(h13, erc-about) A time period covered by the information content
-
where
- (h4, erc) An access-oriented name given to the resource, or where this resource was expressed. Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Formal identification systems include but are not limited to URL, ARK, DOI, and ISBN. DC Mapping: Identifier
-
(h14, erc-about) A location or region covered by the information content.
-
who
- (h1, erc) An entity responsible for creating or making available the content of the resource, in other words, who expressed the resource. Examples of ``who'' include a person, an organization, or a service. DC Mapping: Creator, but if no Creator use Publisher, and if no Publisher, use Contributor.
-
(h14, erc-about) A person or party figuring in the information content.
-
:unkn
- A null element term explaining that the value is unknown. Compared to :unav, this explanation carries a high degree of authority regarding the object described. Anonymous authorship is an example.
-
:unav
- A null element term explaining that the value is unavailable indefinitely. Compared to :unkn, this explanation is intended for intermediary systems that know less about the object described and have to rely on the best metadata received.
-
:unac
- A null element term explaining that the value is temporarily inaccessible. This might be due, for example, to a system outage.
-
:unap
- A null element term explaining that the value is not applicable or makes no sense.
-
:unas
- A null element term explaining that a value was never assigned. An untitled painting is an example.
-
:none
- A null element term explaining that the element never had a value and never will.
-
:null
- A null element term explaining that the value is explicitly empty.
-
:unal
- A null element term explaining that the value is unallowed or suppressed intentionally.
-
:tba
- A null element term explaining that the value is to be assigned or announced later.
-
:etal
- A null element term explaining that the value is a stand-in for other values too numerous to list (et alia).