Sort order 17.06.2026

Databases tailored for linguistic use allow the user to define a sort order of the records. This may then follow some practical requirements of the operator or be just the sort order usual for dictionaries.

In the macrostructure of most semasiological dictionaries, the entries are ordered according to orthographic criteria. This holds true whether the writing system is alphabetic or uses some other sign inventory. In an alphabetic writing system, the sort order of the entries in the macrostructure is usually alphabetical. This is true whether the order is forward or retrograde. Alphabetic order is, of course, specific to the language using the alphabet. It is not even necessarily the order of the letters in the alphabet. In the German writing system, e.g., <ä ö ü ß> are appended at the end of the alphabet; but in the sort order of dictionaries, they are inserted after the letters forming their base, thus <a ä ... o ö ...s ß ... u ü>.

Diacritics and two-level sort order

After this first criterion, a hierarchical order obtains regarding graphic variants of base letters, like <A> or <ã> or <á> as variants of <a>.

  1. Such a variant may constitute an independent unit of the alphabet at the same level of its base model. It then has an order position at the first level.
  2. Otherwise, it may be categorized as a graphic variant of its base model. It then has an order position at the second level.

This alternative may be decided on phonological grounds. For instance:

However, this phonological principle is not always observed. For instance, both in French and in Portuguese, <ç> is a second-level variant of <c> although they represent different phonemes; and the same holds for vowels provided by the tilde in Portuguese.

The two-level sort order is implemented according to the following rule: Given two words W1 and W2 which are homographous except that at position Li W1 has letter X while W2 has letter Y, then

To illustrate:

If <é> is a first-level letter, the sort order is: Beko
Belo
Béko
If <é> is a graphic variant of <e>, the sort order is: Beko
Béko
Belo

Digraphs

A digraph like <sh> or <ch> may be treated in the sort order in either of two ways:

  1. It may be treated as a single letter. It is then an individual member of the alphabet and is assigned its own position – usually following its first letter – both in the alphabet and in the sort order.
  2. It may be treated as a sequence of its two component letters, its phonological specificity being ignored. Then the sort order applies individually to the first and to the second component.

For the Spanish digraphs <ch> and <ll>, sort order obeyed principle #1 for two centuries. The sort order was accordingly carro - cuna - chacal. Since 1994, the sort order has obeyed principle #2, so it is carro - chacal - cuna (as it has always been in German).