The acoustic signal leaving the speaker's mouth and entering the hearer's ear is a sound wave whose properties vary continuously along the dimensions of time, pitch, intensity and timbre. It is an object of phonetics. In the diagram, the waveform is visualizes the vibration of the vocal chords.
The sound system of a language, i.e. its phonology, consists of several categories of discrete units such as phonemes and features whose selection and combination obey phonological rules.
What a listener perceives is a complex acoustic object of the form of the example. His task is to find the sense that the speaker has coded by it. Different occurrences of the same word differ in their physical composition along the above acoustic dimensions. Therefore this cannot be the form in which the significans of a linguistic sign is stored in the mental lexicon. Instead, it is stored in a form closer to its alphabetic representation in writing, i.e. as composed of discrete units.
Viewed systematically, speech perception is a sequence of steps running through the levels of speech production essentially in reverse order:
level | operations/processes |
Acoustics | acoustic percept is analyzed to produce a phonetic representation |
Phonology | phonological rules analyze the phonetic representation to produce a phonological representation |
Morphology | the sign is analyzed morphologically to produce a morphological form |
Symbolization | morpho-phonological representation is matched with a schematic semantic representation |
Semantics | meaningful elements are associated with lexical items, and their structural relations build a complex meaning |
Pragmatics | understood expression is integrated with other knowledge to reconstruct speaker's idea |
This, however, is not how speech perception actually works. First of all, the listener has a certain measure of empathy with the speaker, if only because both are human beings. The listener understands the speaker by putting himself in the speaker's place. He thus undertakes forward construction of the growing idea just like the speaker. On the basis of the speech situation and the context, he can anticipate to a considerable extent what the speaker is going to say. Whenever his expectations are met, they help him in decoding the message. The hearer thus does not, in the first place, run in a bottom-up direction through the same series of steps that the speaker ran through in top-down direction. Instead, he accompanies the speaker in the generation of sense. To a large extent, he uses what he hears only as a corrective for his construction of sense. The entire process is accompanied by self-monitoring where the hearer can check, at every step, whether the result currently reached is compatible with everything else in the speech situation and world knowledge.
A complex phonological unit, e.g. the significans of a word, is constructed from a percept and serves as the input for motor commands given to the speech apparatus. Consequently, speech perception activates the same phonological unit in memory that is used in speech production.
The mirror neuron system gets involved when we perceive somebody executing his own motor commands and may activate the corresponding motor channels in the perceiver. The motor theory of speech perception maintains that when we perceive speech, the phonological units that we reconstruct activate the motor neurons which instigate the speech apparatus. There is, however, no direct connection between acoustic input and motor output; i.e., acoustic features are not directly mapped onto gestures of the speech organs. Instead, the connection is mediated by the mental representation of abstract phonological units. To the extent that motor neurons are actually activated during perception, this is a consequence of spreading activation which reaches beyond the phonological representation.
Spreading activation (Dell 1986) implies that the process of perception and understanding does not necessarily stop when the sense of the heard utterance has been constructed. Activation of cells may spread further and stimulate even such cells as are only needed to produce an utterance with that sense. Thus, spreading activation may be involved in the activity of the mirror neuron system.
Arjmandi, Meisam K. & Behroozmand, Roozbeh 2024, “On the interplay between speech perception and production: insights from research and theories”. Frontiers in Neuroscience 18:1347614 (https://pmc.ncbi.nlm.nih.gov/articles/PMC10850291/#ref68).
Dell, G. S. 1986, “A spreading-activation theory of retrieval in speech production”. Psychological Review 93: 283-321.