Format Element Working Draft
|The Format (DC.Format) of a resource refers to the medium, data format, and materials of the instantiation of that resource, including its size if required. This is important in resource discovery as it allows a user to discriminate on the basis of the software, hardware, and other infrastructure that might be needed to display, operate or otherwise use the resource. For the sake of interoperability, the value of the data format or media should be selected from an enumerated list, as discussed below. Values for DC.Format selected from these lists will be used without annotation in simple Dublin Core.
Format Position Paper (Revised)
The Format (DC.Format) of a resource refers to the medium, data format, and materials of the instantiation of that resource, including its size if required. This is important in resource discovery as it allows a user to discriminate on the basis of the software, hardware, and other infrastructure that might be needed to display, operate or otherwise use the resource. For the sake of interoperability, the value of the data format or media should be selected from an enumerated list, as discussed below. Values for DC.Format selected from these lists will be used without annotation in simple Dublin Core™.
For electronic resources, it is recommended that a value of DC.Format be selected from the list of Internet Media Types (MIME values) whenever possible. The reference list is maintained at http://www.isi.edu/in-notes/iana/assignments/media-types/media-types. Note that this list is a registry and there is a well-defined procedure for adding new types if appropriate.
The size of a digital resource will normally be given in bytes, but additional measurements may be appropriate for particular resources, for example, duration (in seconds) for a sound, linear or areal dimension (in pixels) for an image, word-count for a text.
For other resources, it is recommended that a value of DC.Format be selected from lists of physical media types, such as provided in the Art and Architecture Thesaurus (AAT), maintained by the Getty Information Institute (in particular, see the Materials facet).
There are many ways of measuring size for physical objects, including linear dimensions, area, volume and mass. The user should be careful to specify these to allow the best performance in resource discovery.
The IMT scheme which is recommended for the DC element DC.Format uses some terms with the same names as occur in the list of values for DC.Type (eg image and text). However, there is an important distinction between DC.Type - which defines the genre of the item which is primarily related to the meaning or content of the resource, and DC.Format - which indicates details of the particular instantiation, including medium and size. This difference is exemplified in the case of text resources stored in image formats; thus, there is no inconsistency with a resource having
DC.Type=text and DC.Format=image/g3fax.
Multiple values of DC elements can be managed in several ways. One method is to use a list as the value of an element. Another method is to repeat the element with single values in each. Multiple values of DC.Format will often occur in the case of media plus size, and for multiple measures of size (eg duration plus bytes for audio, mass plus linear dimensions for physical objects, linear plus pixels plus bytes for images). For these cases there would appear to be no ambiguity using any method. However, there may also be resources for which it is necessary to have multiple media types, for example for compound resources, and in these cases it must be indicated which size information should be associated with which media type. Using some coding methods (eg HTML ) it is not possible to group elements, so the repeated element method would result in ambiguity. It may be preferable to use multiple terms comprising a list in a single value as a grouping mechanism, in order to minimise ambiguity.
It will clearly be desirable to add additional structure to the specification of Format in order to allow automated processing of the values. This will require that the property Format be qualified to indicate which type or aspect of the format (eg media vs. size) is indicated in a value, and which vocabulary or encoding scheme is in use (for example, values chosen from the list of Internet Media Types will be qualified with
scheme=IMTor something similar). A more structured form of DC.Format will be particularly important to accommodate the various aspects and measurements of size , including the multiple measurements of size. Recommended practice for combinations of size information for a single resource (eg mass, volume, length-breadth-depth, etc) may be considered, and the method used to indicate the units being used for a size.
There is a need, in the case of resources with multiple values for DC.Type, to associate each DC.Format element with the appropriate DC.Type element. The question of methods for grouping DC elements has not been resolved for some coding methods at this time.
Structured DC.Format will also be used to indicate formats contained in nested-format files (for example, a zip-compressed file will have an outermost format "application/zip" but might contain LaTEX source of format "text/vnd.latex-z").
The structure and syntax of Qualified DC has not been resolved at this time. A refined structure for Format will be implemented according to the general recommendations for Qualified DC.