|Description:||Searching is almost always about compromises. The kinds of access points that the searcher is expecting may or may not be accurately reflected in the indexing provided by the database server. The searching is most accurate, and the searcher is least surprised, when the searcher and client software have a complete understanding of the schema of the database being searched.|
The Problems with Cross-Domain Searching
Searching is almost always about compromises. The kinds of access points that the searcher is expecting may or may not be accurately reflected in the indexing provided by the database server. The searching is most accurate, and the searcher is least surprised, when the searcher and client software have a complete understanding of the schema of the database being searched. The next least surprising environment is when the searcher has no expectations about the database and the server provides the complete document in a single index (e.g. WAIS). This may not be very accurate, but no one is surprised by the results. Anything in between complete understanding and complete ignorance is a compromise and subject to the user being surprised by the results of a search.
On what basis should we make the compromise that will constitute the Cross-Domain attribute set? Perhaps our experience with Bib-1 as a cross-domain attribute set can guide our decisions.
Strengths of Bib-1 as a Cross-Domain attribute set
Weaknesses of Bib-1 as a Cross-Domain attribute set
The Proposed Solution
What we need is an attribute set with not too many, but not too few Use attributes. Their semantics should be well enough defined that they are clear, but not so tightly defined that they apply to only a few subject domains. The Dublin Core elements seem to satisfy these requirements and have the additional benefit of already being accepted as being applicable to many domains. That last point makes an attribute set based on Dublin Core superior to any other arbitrary list of attributes.
One of the points of discussion/development in the Dublin Core community is "qualification". Qualification allows the document developer to say more things about a Dublin Core element than just the type of the element. An example of a qualifier is Scheme which can be used to qualify the source of a subject heading. Such qualification is intrinsic to Z39.50 attribute sets and will be defined in the Dublin Core attribute set.
Because qualification is native to Z39.50 attribute sets and not a topic of debate in the Z39.50 community, we are going to unilaterally resolve one of the Dublin Core issues. We will aggregate Creator, Contributor and Publisher into a single Abstract attribute of Name and provide Semantic Qualifiers to specify the original semantic intent of those elements.
One of the clear strengths of the Bib-1 attribute was the number of Use attributes available. The semantics of many of those Use attributes can be made available in the Dublin Core attribute set through judicious use of Semantic Qualifier and Content Authority attributes. Examples of these are given below, but no attempt has been made to produce a comprehensive listing. This effort awaits ratification of the concept by the Z39.50 Implementors Group.
The Dublin Core elements are stable and well described. They will be referenced from an enumerated set of numeric values. The qualifiers are not as stable and will be referenced as case-insensitive string values. An initial set of values will be proposed, but there is no reason that this list could not be extended locally.
The DC Attribute Set
Abstract Attribute Type
There are thirteen abstract access points. Their meaning is taken from Description of Dublin Core Elements (with the exception of Name, which is defined solely in the table). Information in italics has been added for clarifications. The values and semantics of the Abstract attributes are:
|Title||1||The name given to the resource, usually by the Creator or Publisher. (The type of the title can be clarified with a Semantic Qualifier attribute. Examples are Former Title and Abbreviated Title)|
|Subject||2||The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. (The source of a subject heading can be clarified with a Content Authority attribute. Examples include LCSH and Mesh.)|
|Name||3||A person or organization associated with the resource. (The nature of the role of the person or organization can be clarified with a Semantic Qualifier. Examples are Creator, Contributor and Publisher)|
|Description||4||A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. (The type of description can be clarified with a Semantic Qualifier attribute. Examples are Abstract and Note.)|
|Date||5||A date associated with the creation or availability of the resource. Such a date is not to be confused with one belonging in the Coverage element, which would be associated with the resource only insofar as the intellectual content is somehow about that date. (The type of the date can be clarified with a Semantic Qualifier attribute. Examples include Publication Date and Acquisition Date.)|
|Resource Type||6||The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary.|
|Format||7||The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource.|
|Resource Identifier||8||A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names are also candidates for this element.|
|Source||9||Information about a second resource from which the present resource is derived.|
|Language||10||The language of the intellectual content of the resource.|
|Relation||11||An identifier of a second resource and its relationship to the present resource. (The type of the relation can be clarified with a Semantic Qualifier.)|
|Coverage||12||The spatial or temporal characteristics of the intellectual content of the resource.|
|Rights Management||13||A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.|
Field Name Attribute Type
No Field Name attributes will be provided in the Dublin Core attribute set.
Query Management Attribute Types
The Query Management attribute types (Normalized Weight, Hit Count and Stopwording) will not be defined in the Dublin Core attribute set. They are defined in the Z39.50 Utility attribute set.
Language Attribute Type
The Language attribute type will not be defined in the Dublin Core attribute set. It is defined in the Z39.50 Utility attribute set.
Content Authority Attribute Type
Content Authority attributes will consist of case-insensitive strings. More than one Content Authority attribute may be specified with a term in a query. The server will combine the Content Authority attributes pairwise with the single Abstract attribute for the term and determine the combination that best matches an actual access point in the database. If a Content Authority attribute is provided with a term in a query, then one of the pairwise combinations must be chosen or the server must fail the query. If the client is willing to let the server revert to the semantics for the base Abstract attribute, the client can provide the NULL Content Authority attribute value defined in the Z39.50 Utility attribute set. Currently, the Content Authority attribute is only used in combination with the Subject Abstract attribute, but its use with other Abstract attributes is not ruled out.
The list of Content Authority values provided here is not intended to be comprehensive. It is expected that it will be extended both formally and locally.
|LCSH||Library of Congress Subject Heading||Library of Congress|
|LC Children's||Library of Congress Subject Headings for Children||Library of Congress|
|MeSH||Medical Subject Heading||US National Library of Medicine|
|AAT||Art and Architecture Thesaurus||Getty Information Institute|
|BDI||Bibliotek Dokumentasjon Informasjon||a controlled subject vocabulary used and maintained by the five Nordic countries (Denmark, Finland, Iceland, Norway, and Sweden).|
|INSPEC||Information Services for the Physics and Engineering Communities||the Information Services Division of the Institution of Electrical Engineers.|
Expansion/Interpretation Attribute Type
The Language attribute type will not be defined in the Dublin Core attribute set. It is defined in the Z39.50 Utility attribute set.
Semantic Qualifier Attribute Type
Semantic Qualifier attributes will consist of case-insensitive strings. More than one Semantic Qualifier attribute may be specified with a term in a query. The server will combine the Semantic Qualifier attributes pairwise with the single Abstract attribute for the term and determine the combination that best matches an actual access point in the database. If a Semantic Qualifier attribute is provided with a term in a query, then one of the pairwise combinations must be chosen or the server must fail the query. If the client is willing to let the server revert to the semantics for the base Abstract attribute, the client can provide the NULL Semantic Qualifier attribute value defined in the Z39.50 Utility attribute set. Currently, the Semantic Qualifier attribute is only used in combination with the Name, Description, Date and Relation Abstract attributes, but its use with other Abstract attributes is not ruled out.
The list of Semantic Qualifier values provided here is not intended to be comprehensive. It is expected that it will be extended both formally and locally.
|Value||Combining Abstract attribute||Meaning|
|Creator||Name||The person or organization primarily responsible for creating the intellectual content of the resource.|
|Publisher||Name||The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.|
|Contributor||Name||A person or organization not specified in a Creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a Creator element (for example, editor, transcriber, and illustrator).|
|Editor||Name||A person who prepared for publication an item that is not his or her own.|
|Abstract||Description||An abbreviated, accurate representation of a resource, usually without added interpretation or criticism.|
|Note||Description||A concise statement in which such information as extended physical description, relationship to other resources, or contents may be recorded.|
|Publication Date||Date||The date in which a resource is published.|
|Acquisition Date||Date||The date when a resource was acquired.|
|Date Added||Date||The date and time that a resource was added to a database.|
|Date Last Modified||Date||The date and time a resource was last modified.|
|Contained In||Relation||The identifier of a resource of which this resource is a part.|
Comparison Attribute Type
The Comparison attribute type will not be defined in the Dublin Core attribute set. It is defined in the Z39.50 Utility attribute set.
Format/Structure Attribute Type
Development of this attribute awaits developments in the Z39.50 Implementors Group and the Dublin Core community. A clear example is the ISO 1066 date format. Beyond that single example, there is little agreement.
Occurrence Attribute Type
The Occurrence attribute type will not be defined in the Dublin Core attribute set. It is defined in the Z39.50 Utility attribute set.
Indirection Attribute Type
The Indirection attribute type will not be defined in the Dublin Core attribute set. It is defined in the Z39.50 Utility attribute set.
Problems with this solution
For the database provider, their data may not fall neatly into a single domain. With the many Use attributes in the Bib-1 attribute set, the provider could map nearly all of their access points to something in the Bib-1 attribute set and expect that the client would have some idea of how to use that access point.
Here is an example of the problem. OCLC provides access to its FirstSearch databases via Z39.50. Many of those databases are not strictly bibliographic. Some of them contain data about the organization of businesses; others contain the full text of medical journals. Most of the access points in those databases could be made to fit into a Bib-1 Use attribute, even if only badly. There are significantly fewer access points available in the proposed Cross Domain attribute set. The database providers will need to find semantics for these access points in other, domain specific, attribute sets. Alternatively, the database providers will develop their own peculiar, database-specific, attribute sets which will significantly reduce interoperability.
The new attribute architecture will require that the database developers logically subdivide their data into the appropriate domains and map their data to access points in those domains. What are the chances that the client developers will make the same decisions based on their understanding of those access points? How will the client learn what attribute sets are needed to search a particular database? Who is going to develop those domain specific attribute sets? How will the client and database developers become aware of their existence? These are problems that are going to haunt us as we continue to develop the new attribute architecture. Clearly, Explain must play an important role in our continued development.