Dublin Core™ and z39.50
|Description:||The Dublin Core Element Set V1.1 (DCES) can be represented in many syntax formats. This document explains how to encode the DCES in XML, provides a DTD to validate the documents and describes a method to link them from web pages.|
This document is substantially changed from the previous draft. It incorporates changes reflecting the results of the January meeting of the Z39.50 Implementors Group (ZIG).
This paper proposes a mechanism for using the Dublin Core™ for search and retrieval in Z39.50. Agreement for how to do this for the 15 basic elements of the Dublin Core™ within the constraints of Version 2 has been reached. Consensus agreement on how to use Dublin Core™ qualifiers and schemas with the new attribute architecture for Z39.50 awaits completion of development in both communities. This paper is not a tutorial on Z39.50. A minimal amount of understanding of Z39.50 will be necessary to understand the details of this discussion.
There are three broad philosophies for searching in Z39.50: Minimalism, Maximalism and Structuralism.
Minimalism wants a small set of semantically fuzzy access points that support broad interoperability among a diverse set of implementors. Minimalists assume that it is easy for a database provider to identify an access point as corresponding to "name" (a semantically fuzzy concept). It is not always easy for a database provider to distinguish between a "personal name" and a "corporate name" (more semantically rigorous concepts). In addition, client implementors and their users are often unaware of or uninterested in the more specific access points. Minimalists are willing to give up some accuracy in their searches in exchange for improved chances of finding records in diverse databases. They are willing to trade precision for recall.
Maximalism wants a large set of semantically unambiguous access points. For Maximalists, the specificity of the search is of primary importance. Maximalists are willing to give up interoperability with diverse databases that do not support the very specific access points that they require in exchange for more accuracy in the records retrieved. They are willing to trade recall for precision.
Both Minimalists and Maximalists are trying for semantic interoperability. They make no assertions about the structure of the records.
Structuralism wants to take advantage of the deep understanding Structuralists have of the records being searched. Because there are relatively fewer databases that have identically structured records, Structuralists further reduce their interoperability even more than the Maximalists in exchange for even greater search precision.
This paper will propose mechanisms that will allow all three groups to search the way they want. In addition, this paper will make recommendations for database providers and client developers that will maximize interoperability while still providing the specificity that the Maximalists and Structuralists require.
Dublin Core™ and Searching
Making the Dublin Core™ usable for searching in Z39.50 appears initially to be trivial. All that is needed is to add the Dublin Core™ elements as Use attributes in a Z39.50 attribute set. The issues surrounding that decision turn out to be complex. Which attribute set should be used? Should a new attribute set be created or should an existing attribute set be extended? Which implementor community is the solution focused on: Version 2 or Version 3?
The brief answer is that the 15 elements of the Dublin Core™ will be added as Use attributes in the Bib-1 attribute set for Z39.50 Version 2 clients and servers and that a new attribute set be created for use by Version 3 clients and servers. The new attribute set will depend on the capabilities described in the Z39.50 Attribute Architecture and will provide access to the schemes and qualifiers being actively developed in the Dublin Core™ community.
Searching in Version 2
The fifteen Dublin Core™ elements have been added as Use attributes to the Bib-1 attribute set.
Dublin Core™ Element
|Z39.50 Use Attribute|
The semantics for these new Use attributes are taken from Dublin Core™ Metadata Element Set: Reference Description.
Searching in Version 3
The proposed extensions to the Bib-1 Use attributes provide minimal access to the Dublin Core™ elements for Version 2 clients. But, to gain access to the full complexity of the Dublin Core™ (i.e., qualifiers and schemes), a new attribute set will be proposed. This attribute set will consist of Use, Fieldname, ContentAuthority and Structure attributes. The Use attributes will be an enumerated list consisting initially of the 15 Dublin Core™ elements. This will support Minimalist searching. The Fieldname attributes will be a list of element names and element qualifiers drawn from Dublin Core™ Qualifiers/Substructure. This will support Structuralist searching. Reasonable permutations of Fieldname combinations will be added to the enumerated list of Use attributes to support Maximalist searching. Examples of such permutations are Creator-PersonalName, Title-Alternative, and Date-Creation. New Structure attributes are proposed. They can be added to either the Bib-1 Structure attributes or specified as new Structure attributes in the Dublin Core™ attribute set. Other attribute types (e.g. Relation) can be used from the Bib-1 attribute set or any of its derivatives.
The primary list of Use attributes is: Title, Author, Subject, Description, Publisher, OtherContributor, Date, ResourceType, Format, ResourceIdentifier, SourceIdentifier, Language, Relation, Coverage and RightsManagement. The semantics for these attributes are specified in Dublin Core™ Metadata Element Set: Reference Description. The Dublin Core™ Use attributes are an enumerated list and will be assigned consecutive numbers starting at 1.
The secondary list of Use attributes is only tentative and needs to be further developed by the Dublin Core™ community. They are:
Creator-PersonalName, Creator-CorporateName, Creator-Address, Creator-PersonalName-Address, Creator-CorporateName-Address
Publisher-PersonalName, Publisher-CorporateName, Publisher-Address, Publisher-PersonalName-Address, Publisher-CorporateName-Address
Contributor-PersonalName, Contributor-CorporateName, Contributor-Address, Contributor-PersonalName-Address, Contributor-CorporateName-Address
Date-Creation, Date-Modification, Date-Publication, Date-Available, Date-Valid, Date-Acquisition, Date-Accepted, Date-DataGathering
Relation-Creative, Relation-Mechanical, Relation-Version, Relation-Inclusion, Relation-Reference
Coverage-PeriodName, Coverage-PlaceName, Coverage-t, Coverage-x, Coverage-y, Coverage-z, Coverage-Polygon, Coverage-Line and Coverage-3d
The semantics for these attributes are specified in Dublin Core™ Qualifiers/Substructure. Numbering for the Title permutations will start at 100, Creator at 200, Publisher at 500, Contributor at 600, Date at 700, Relation at 1300 and Coverage at 1400.
The list of Fieldnames is only tentative and needs to be further developed by the Dublin Core™ community. They are: Title, Alternative, Creator, PersonalName, CorporateName, Address, Subject, Description, Publisher, Contributor, Date, Creation, Modification, Publication, Available, Valid, Acquisition, Accepted, DataGathering, ResourceType, Format, ResourceIdentifier, Source, Language, Relation, Coverage, PeriodName, PlaceName, T, X, Y, Z, Polygon, Line, ThreeD and RightsManagement. . The semantics for these attributes are specified in Dublin Core™ Qualifiers/Substructure. These attributes will have string values.
ContentAuthority and Structure Attributes
The concepts of ContentAuthority and Structure are unfortunately combined in the current Dublin Core™ documentation. This is because they only have one HTML attribute available to them for tagging this data: Scheme. The attributes listed below as ContentAuthority and Structure are all listed as Schemes in the Dublin Core™ documentation. The Dublin Core™ community recognizes the syntactic ambiguity and expects to be able to clarify the situation.
The list of ContentAuthorities is only tentative and needs to be further developed by the Dublin Core™ community. They are: LCNAF, LCSH, MeSH, AAT, DDC, LCC, NLM, UDC, MIME, DCPMT, IETF-RFC-1766, Z39.53, ISO-639-1 and ISO-639-2B. The semantics for these attributes are specified in Dublin Core™ Qualifiers/Substructure. These attributes will have string values.
The list of Structure attributes is: ISO-8601, ANSI-X3.30, IETF-RFC-822, URL, URN, ISBN, ISSN, SICI, FPI. The first three structure attributes are date formats. The others are standard identifier formats. FPI is a Formal Public Identifier. These attributes will have enumerated values. The actual values will depend on whether they are added to the Bib-1 or Dublin Core™ attribute sets.
Guidance to Client Developers
Developers of Maximalist clients should be aware that they have reduced the number of databases they can search because of the semantic precision they require in their searches. They should be prepared to use less semantically precise Use attributes if they receive an error code of 114 which means that the database does not support a particular Use attribute. The code will be accompanied by the value of the unsupported Use attribute. For example, a search using Author-PersonalName should be changed to simply Author. Client developers should be aware that some databases will automatically convert overly precise Use attributes to less precise attributes. If this is not desired, Version 3 clients can specify a SemanticAction of 1 along with the Use attribute, which means that no substitution should be performed.
Similar advice applies to Structuralist client developers, except that the code for specifying an unsupported Fieldname attribute has not yet been specified.
Minimalist client developers do not have a similar fall-back option. If they send in a search for Names and the database does not support such an access point, they might be able to change their fuzzy search into an OR'ing of more specific access points, such as PersonalName and CorporateName. But, this is not always straightforward. However, the next section describes how database providers can make this problem easier.
Guidance to Database Providers
Providers of Maximalist and Structuralist databases should be aware that maximum interoperability is available with Minimalist searching. It is strongly recommended that additional access points be made available by grouping semantically precise access points together as semantically fuzzy access points. For example, Publisher-PersonalName and Publisher-CorporateName should also be available as Publisher.
Dublin Core™ and Record Retrieval
Solutions have been developed for retrieving Dublin Core™ elements in three Z39.50 record syntaxes: USMARC, HTML and GRS-1.
A description for encoding of Dublin Core™ elements in USMARC records was developed at the Library of Congress and is described in Dublin Core/MARC/GILS Crosswalk. These records can be retrieved using the USMARC record syntax (Object Identifier: 1.2.840.10003.5.10).
HTML encoding of Dublin Core™ elements is described in A Proposed Convention for Embedding Metadata in HTML  which describes how to encode Dublin Core™ elements in HTML 2.0. (Rules for encoding Dublin Core™ elements in HTML 4.0 is under active development but is not yet complete.) Z39.50 supports an HTML record syntax (Object Identifier:1.2.840.10003.5.109.3). HTML records can also be sent encapsulated in a GRS-1 record. When this is done, the HTML is contained in a single field in the record and the field is tagged with a variant type of HTML (variant class=2 (BodyPartType), type=1 (IANA), value=text/html).
Dublin Core™ elements can also be retrieved directly in GRS-1 records. In the GRS-1 record syntax, elements have numeric tags consisting of two components: a tag type and value. There are two universally recognized tag types: M and G. The Dublin Core™ elements have been added to tagset-G. To embed a Dublin Core™ element in a GRS-1 record, the element is put into a field in the GRS-1 record and the field is given a tagType of 2 (tagset-G) and a tagValue from the list described in TagSet -G and -M Elements.
 Z39.50 Draft Attribute Architecture: http://lcweb.loc.gov/z3950/agency/attrarch/attrarch.html
 Dublin Core™ Metadata Element Set: Reference Description: http://dublincore.org/specifications/dublin-core/dces/
 Dublin Core™ Qualifiers/Substructure: http://www.loc.gov/marc/dcqualif.html
 Formal Public Identifier: http://www.oasis-open.org/cover/tauber-fpi.html
 Dublin Core/MARC/GILS Crosswalk: http://www.loc.gov/marc/dccross.html
 A Proposed Convention for Embedding Metadata in HTML: http://dublincore.org/workshops/dc2/resources-weibel-19960602.shtml  TagSet -G and -M Elements: http://lcweb.loc.gov/z3950/agency/defns/tag-gm.html