Representing Peoples Names in Dublin Core
|Description:||This note provides some guidance on representing peoples' names in metadata.|
This note provides some guidance on representing people's names in metadata.
1. The Problem
While most people only have one name, that name may be written down in many different ways. The name may be written in full (e.g. 'John Stuart Mills'). Components of the name can be abbreviated (e.g. 'John S. Mills', or 'J.S. Mills'), or omitted (e.g. 'John Mills'). Names may be extended by titles or honorifics (e.g. 'Mr. John Mills'). The components of the name may be reordered (e.g. 'Mills, John Stuart'). Complexity is added by the fact that people frequently do not use their 'official' name. People often prefer to use shortened or alternative forms of their name (e.g. 'Kathy' for 'Kathryn', and using 'Jack' for 'John' was once common in Australia). Some people prefer to use their second name instead of their given name (e.g. 'Margaret Read' instead of 'Frances M. Read').
The final dimension of complexity occurs when names from many cultures must be handled. Appendix A summarises the name forms of some of the cultures commonly found in Australia. The range of different components which can be found in names is astounding, as is the number of ways these components can be ordered. To make handling names even more difficult, it is common for migrants to alter their name when integrating into another culture. In the booklet from which Appendix A was drawn, it was noted, for example, that people with names which start with the family name often move the family name to the end to fit with the dominant name form in Australia.
2. The Uses of Names
Given that name forms are not internationally consistent, and that individuals often vary their name to suit themselves, what is the best way of representing names in metadata? To answer this question, it is worth considering how names are used in a metadata system:
* As a piece of information. Often, the user is interested in using the name as a piece of information in its own right. A user might ask, for example, "Who wrote 'The Lion, the Witch, and the Wardrobe'?". The user has some reason for wanting this information, such as to check that the returned entry is the correct one, or to carry out further work. When using the name as a piece of information, it does not matter how the name is expressed, *provided the user understands the convention*. For example, a library catalog would return the name 'Lewis, Clive Staples, 1898-' and the user is expected to understand (by convention) that the first part is the family name, and that the string '1898-' is not part of the name at all.
* As a search key. The user is interested in searching for entries associated with the name.
Most current search engines search the full text of the entry, it is consequently irrelevant *to the search* engine how the components of the name are ordered. In some situations, however, it is relevant to the user. Full text searching will match any occurrence of the search string in the names. A search for the surname 'Andrew', for example, will match all names with 'Andrew' in any part of the name (the family name, or any of the given names). This will return significantly more matches than if it had been possible to limit the search to just the family names. In English derived names it is not common to use family names as given names. But this may not be true for names from other cultures.
* As a sorting key. Names are often used to sort a list of results, and there is usually a convention on how names are to be sorted within a culture. With Australian names, for example, the convention is to sort by family name. Unfortunately, it is difficult to construct a general algorithm to extract the primary and secondary sort keys, particularly when the system must handle names from many cultures. A common approach (used, for example, in library catalogs) is to re-order the components so that the primary key comes at the start of the name. The different approaches to representing names trade off the various uses. For example, representing a name in the 'natural' order (i.e. the order in which it is spoken) is probably best if the name is being used as a piece of information, particularly if names from many cultures are going to be mixed in together. But such a representation would make it difficult to sort Anglo Saxon names which should be sorted by family name.
4. Approaches to Expressing Names
It is unlikely that there will be agreement on a single common way of representing names. The following are the preferred methods, in order of preference.
a. Use whatever you already have
In many cases, the metadata will be a view on an existing database (e.g. a library catalog or HR database). Simply adopting the name representation policy used in that database has the following advantages:
- The names are compatible with other databases that share the same format.
- You do not have to expend resources in entering or maintaining the names. (This is a very significant cost.)
The disadvantages are:
- The existing database might not be designed with international scope in mind. Does it, for example, assume that every name has given name, an initial, and a family name?
- The names may not be compatible with other databases that you wish to work with (e.g. library catalogs).
b. Adopt a existing naming authority
If you don't have existing data, or it is not appropriate to use the existing format, it is possible to adopt an existing naming authority. These are simply long lists of names in a standardised representation. An example is the (US) Library of Congress Name Authority File, but most national libraries would maintain a similar name authority file.
The advantages of using an existing name authority file are:
- Compatibility with other databases. Standard name authority files are very widely used.
- Consistency in application.
The disadvantages of using an existing name authority file are:
- The names you need may not be in the authority. This is particularly true if the authority tends to specialise. For example, the Library of Congress Name Authority File would have a very good coverage of US authors, but might not cover Japanese authors or US union leaders as well. To add extend the authority, you *must* fully understand the rules that were used to produce the authority (otherwise names will be inconsistent).
- Name authorities must be purchased
- To be effective, name authorities must be used. That is, when it is necessary to add an entry, the name authority must be consulted to obtain the official representation. This obviously takes time, and will not be economic for some applications.
c. Adopt an existing naming guidelines
If there is no suitable naming authority for you to use, you may be able to adopt existing guidelines on how to represent names. Appendices B and C contain summaries of two such guidelines.
The advantages of adopting an existing guidelines are:
- Compatibility with other databases (both existing and future) that share these guidelines
- Completeness of rules. There are many complex issues in representing names, and a widely adopted set of guidelines is more likely to address these issues than one developed in house.
- Lower cost of development of the guidelines.
The disadvantages are:
- The guidelines may be more complex than is actually required in your application. For example, the AACR include sections on titles of nobility and terms of honour.
- The guidelines may require considerable training and resource material to apply consistently. The naming rules for naming non Anglo/US names in the AACR, for example, assume access to reference books in the native language of the person being named. Most organisations are unlikely to have access to such resources, nor would staff be skilled in using those resources.
d. Develop your own naming guidelines
If there is no suitable naming guidelines that you can use, you will have to develop your own.
If you can, simplify an existing naming scheme. At least you will know what flexibility and power you are removing from your scheme.
If you have to develop your own naming scheme, think carefully about:
- Which of the three uses for names (section 3) are important to you.
- What will be the cost of collecting the names in the format you choose.
The conventional method in the US, UK, and Australia is to store the family name separately from the given names. This can cause problems with non Anglo Saxon names, as different data entry staff may enter the name is different ways thus fragmenting your records. An alternative method is to enter the preferred name in one field and the sort key in a second. The preferred name is often easily obtainable from person (indeed, they may be much happier to give it than their full official name). The sort key is the part of the name used as the primary sort key (usually the family name), again normally easily obtainable from the person. The preferred name is presented when the entry is used, and the sort key when the entry is sorted with other entries. If necessary, this approach can be extended to include a full official name.
Elizabeth Cherhal started the ball rolling with two very sensible questions. Stu Weibel, Simon Cox, Ann Apps, Daniel Brickley, Jon Knight, Michael Jost, Karen M. Hsu, and John A. Kunze chimed in with helpful suggestions.
Appendix A: National Name Forms
The purpose of this appendix is not to give the definitive list of name forms (in particular, all cultures will have names that don't conform) but to give readers an idea of the wide range of name forms in use in the world. Hopefully, this will encourage metadata designers to move away from designs which assume that names can be crammed into
This summary of national name forms is based on the Australian booklet 'Naming systems of ethnic groups: A guide for Social Security staff and community workers', Department of Social Security, 1990, ISBN 0 644 12167 X
Many cultures use the common name form of one or more Given Names followed by a family name. These include: Armenian, Cypriot, Estonian, Finish, Greek, some Indian (Hindi, Gujerati, and Bengali), Latvian, Lithuanian, Macedonian, Maltese, Maori, Russian, Slovenian, Tongan, and Ukranian. Arabic is similar, but the name may include a prefix (e.g. 'El') which is not part of the family name. (The guide did not discus British, American, French, German, or Dutch names).
The second most common name form is where the family name precedes the given names. Such names are found in the following cultures: Chinese, Croatian, Czeck, Hungarian, Italian, Khmer, Korean, Laotian, Polish, and Serbian. Some of these cultures only have one given name. Where two given names are present, some cultures use the first given name as the primary name, others the last.
Many cultures use neither of these two name forms. Examples of the more complex name forms include:
In Iran it is customary to put the village name before the family name (Grandfather's name) on all official documents. This is *not* part of their name.
The name order must always be checked; sometimes names have been reversed to suit English custom. Most women attach their husband's name before their own upon marriage. Family names may be composed of two components.
Baptismal name is not often used. Names are often abbreviated (both dropping components, and shortening components). Married women usually drop their maternal family name and add their husband's paternal family name after their own. Widows usually add 'Vedova' (abbreviated 'Vda') before their husband's family name.
When widowed, women may add 'ozvegy' (abbreviation 'ozv') before family name.
'Singh' and 'Kaur' are religious names. Some Sikhs may include this as part of their family name (perhaps hyphenated). 'Singh' and 'Kaur' may be abbreviated to an initial.
<Father's Given Name>
Father's given name may be written as an initial. The father's given name may be replaced by (or supplemented by) birthplace, mother's house name, or patronymic name depending on region (and may be abbreviated as initials).
There may be no clan name (i.e. the name is just a single given name). The clan name may be their father's name, or it may be shared by the whole community. Name components may be abbreviated as initials.
Married women may add their husbands family name to their name:
Every given name has two parts (syllables) written as two words, which may be hyphenated. Both parts must be used (it is not correct to abbreviate or drop the second). Some Koreans use an English given name for everyday use.
Some given names may be used for either sex, so the name may be preceeded by the titles 'Thao' or 'Chao' (Male) or 'Nang' (Female) to indicate sex.
'Bin' and 'Binte' mean 'Son/Daughter of' and will not be present for a non Muslim Malay. Married women traditionally add 'Puan' before their given names.
Nearly every woman has 'Maria' as her first name, and the second is used in everyday use. In Australia, many Portuguese have dropped one family name and added 'Da', 'Das', 'Dos', or 'De' to the other family name. Married women add their husband's paternal family name to the end of their name.
Children are given their father's first two names at birth. The father's names are usually abbreviated as initials.
Married women traditionally drop their maternal family name and add the husband's paternal family name prefixed by 'De'.
The middle name is not used on a day to day basis.
The sex indicator is normally 'Thi' for a female, and 'Van' for a male.
Appendix B. EULER Project name conventions
Euler (European Libraries and Electronic Resources in Mathematical Sciences) is a European project to provide network access to mathematical publications (see http://www.emis.de/projects/EULER/). The following text describing naming practices was provided by ##.
Author(s), Editors, Author References
Author names have been implemented in a form common to all STN databases, i.e. last name, first name, middle name. First and middle names can or cannot be abbreviated. When searching for an author's name, it is recommended to use only the first initial. You will get all forms of the first names, because the system adds automatically a truncation symbol. Thus, the following forms of implementation are possible: Examples:
Friedrich Wilhelm Mahle as Mahle, F.
Mahle, F. W.
Mahle, Friedrich W.
Mahle, Friedrich Wilhelm
and as an Editor (e.g.) Mahle, Friedrich W. (ed.)
The recommended search form is:
The system will automatically search for
Names containing a preposition (von, van), article (le), combination of article and preposition (du, vander), relationships or attributes (Fitz, Mac, Jun., III) have the format common to their country of origin. It is therefore recommended to search for names with the supplements placed in front of the name and after the name.
Document Database Record
Peter von der Muehl as Muehl, Peter von der
Fritz von Heyden (DE) as Heyden, Fritz von
Fritz Von Heyden (US) as Von Heyden, Fritz
A. De L'Aigle as L'Aigle, A de
C. M. Di Bari as Di Bari, C.M.
Michel Del Pedro as Del Pedro, Michel
L. C. MacLean as MacLean, L.C.
John Fitz Gerald as Fitz Gerald, John
A. Miller jun. as Miller, A.jun.
Names in cyrillic letters have been transcribed according to the ISO standard. In some cases this transliteration will differ from that one on the translated document or in the western journal. In that case we include the different form of spelling as an author reference displayed in braces in the author field.
Appendix C Summary of Naming Rules in AACR
The Anglo American Cataloging Rules  is the standard which describes how objects in Canadian, US and UK libraries are cataloged. It is also used in other countries (e.g. Australia). Part of the these rules describe how people's names are represented in catalog entries. This appendix summarises those naming rules.
The general principles are that:
- The name used should be the one the person is commonly known by. For example, 'Mark Twain' not 'Samuel Clemens'. (In library catalogs, other names are added as cross references.)
Accents and diacritical marks should be included, as should hypens between given names if they are used by the person.
Normally, the preferred name is obtained from the works the person authored, but it may be obtained from references issued in the person's language or country of residence or activity.
If the name is from a non-roman script, and there exists a well known English language version, use the English language version. For example, 'Confucius' not 'K`ung-tzu'. Other versions are added as cross references as necessary. (This rule would almost certainly not be adopted in libraries whose language is other than English!)
- The names are arranged so that the components used to sort the name (the 'entry element') appear first. In Anglo Saxon names, for example, names are sorted by family name. The 'entry element' of 'Clive Staples Lewis' is 'Lewis' and the name would be represented as 'Lewis, Clive Staples'.
An authoritative alphabetic list in the language of the person's country of residency or activity is used to determine the entry elements of a name. An authoritative alphabetic list is a 'Who's who' (or similiar), not a telephone directory. (The difference seems to be that one is sorted by humans, or at least checked by humans, while the other is sorted mechanically).
If the entry element is a family name (surname) it is followed by a comma *even if the family name normal comes first* such as Chinese names.
Some special rules:
* For compound family names (names which contain two or more name components), the following rules apply (in order):
- The entry element is the name the person prefers to be listed even if this is longer than the family name (e.g. 'Lloyd George, David' even though his father's family name was 'George').
- If the compound names are normally hyphenated, the name is entered under the full compound name.
- Unless the person is Portugese OR a woman whose family name consists of a her maiden name and husband's family name, enter under the first element of the compound surname.
- If the person is Portugese, enter under the second element of the compound surname.
- It the person is a woman whose family name consists of a her maiden name and husband's family name, enter under the first element of the name if the woman is Czech, French, Hungarian, Italian, or Spanish. Otherwise enter under the husband's surname.
If the name appears to be a compound name, but cannot be checked, it should be treated as a compound name unless the language is English or one of the Scandinavian languages. For a Scandinavian name, add a crossrefence under the compound name.
* A place name connected to the surname by a hypen is considered to be part of the surname.
* Relationship terms (Jnr, fils) are not considered part of the surname unless it is a Portugese name. If it is necessary to distinguish between two identical names, the term is appended to the name after a comma (e.g. Smith, John, Jnr
* If the surname includes an article or preposition as a prefix (e.g. van, du, le) enter it under the article or preposition if this is the way it is sorted in the person's language or country of residence or activity.
If the surname includes a prefix which is not an article or preposition (e.g. 'Ap' in Welsh names or 'Mac'), enter it under the prefix.
* If the name does not include a surname, list under the given name.
If the name does not contain a surname, but does include a patronymic (a name derived from their father's name), do not consider the patronymic as a surname; list the name under the given name. If the patronymic comes first (e.g. in Mongolian names), rearrange the name so that the given name comes first.
 Anglo-American Cataloging Rules, Second Edition, 1988 Revision, Amended 1993, Michael Gorman & Paul W. Winkler (eds), published by American Library Association, Canadian Library Association, and Library Association Publishing Ltd.