A datum is a piece of data.1 An explication of the concept of the datum presupposes the concept of ‘phenomenon’ and a notion of scientific discourse in the context of an empirical discipline.

An item or occurrence in the observable world is a phenomenon. The subject matter of an empirical discipline (as opposed to a logical or a hermeneutic discipline) consists of a set of phenomena. This set is called the ultimate substrate of the discipline.

In the context of empirical research, a datum represents a phenomenon. While a datum is thus based on something and therefore disputable (German hintergehbar), a phenomenon is an elementary fact (it is unhintergehbar).

Data are complex objects

At first blush, it may seem that data are things. A core sample consists of material extracted from a drilled hole. It may serve as geological data. Likewise, a recording of a narrative in some language may serve as data in linguistics. As such, these are mere things. However, taken as such, they are no data either. They may become data if they are provided with a statement on what segment or aspect of the subject area and, in the end, of the ultimate substrate they represent. Thus, the core sample is provided with information of which part of the soil it represents; and likewise the narrative is provided with information on the language that it represents. Without such information, the core sample and the narrative are no data at all. Therefore, a datum is a complex object composed of an object that represents a segment or aspect of the ultimate substrate and a proposition stating this representative relation.

Data are constituted by their methodological function

For a proper understanding of the notion of ‘datum’, a linguistic analysis of this word is expedient. Latin datum is the perfect participle of the verb dare ‘give’ and thus means ‘given’. The form evokes the trivalent argument frame of this verb. Projected on the interactive situation of a scientific discourse, the giver and the recipient are participants in the discourse, and the given is the complex object transmitted.

As an example, consider the statement ‘At the observation station of Erfurt, Germany, on January 1, 2022 at 8:00 a.m., the air temparature in the shadow was 7° Celsius.’


[Picture: the datum in its speech situation]

Scientific research and discourse is a purposeful activity. A scientist uses a datum not because it is available, but with some goal in mind. A proposition representing a phenomenon is not in and of itself a datum; it is its role in scientific method which constitutes a datum. In inductive reasoning, the researcher selects a set of data on which he bases some statement which contributes to a theory. In deductive reasoning, the researcher proposes an empirical claim which he then tests by appropriate data.

Relation between the datum and the phenomenon

The above configuration evokes the semiotic trilateral model of the sign as mediating between a sender and an addressee and denoting the entity meant. This analogy is valid: the datum is a sign of its phenomenon.

A datum is not necessarily concrete; it may involve an abstraction. For instance, from all the January temperature measurements at 8 o'clock, an average January morning temperature at Erfurt may be calculated. It may be represented by the proposition ‘The average morning temperature at Erfurt in January is 5.6° Celsius.’ This, again, may be used as a datum in some more general meteorological statistics or in a description of the amenities of Erfurt.

The datum between source and user

By the same token, the connection between the source of the datum and the recipient may be mediate. At one pole of the variation, the source and the recipient of some datum may be identical. This is the case if a scientist observes some phenomenon, converts his observation into a statement and bases further research on it. At the other pole of the variation, there is a chain of transmission of a datum. Obviously, the recipient who takes the datum for granted trusts the source, or even each of the sources in the transmission chain.

All empirical statements are open to doubt and falsification. Data, however, have a different status in scientific communication from theses, arguments and proofs. Though it is true that data are hintergehbar, they are more rarely disputed. On the one hand, they purport to represent phenomena of the subject area of the discipline and therefore have a flavor of being “real”. On the other hand, their recipient commonly does not have the possibility to check them, thus simply having to trust the source. The source may be responsible for errors in the data or may even have invented them.2 More on this in the section on sources of linguistic data.

1 The noun data is used as a plural noun whose singular is datum.

2 The work by Hans Jürgen Eysenck is a famous case.