If the language being analysed has an established orthography, there are often good reasons to use it, instead of or in addition to more formal linguistic representations. The following deals with conventions valid for level #3 of the table of linguistic representations. Most of the following recommendations are specific to this level and do not apply to the other levels. Focus is on such examples whose grammatical properties are at stake.
Source of example text
For the orthographic conventions to be observed, the origin of the piece of text to be represented makes a difference. It may be:
- quoted from a published source
- transcribed/transliterated from a written source
- transcribed from an oral source.
Quoted published source
The source of a quoted piece of text (case #1) may either be a primary source, i.e. some (typically non-linguistic) text composed in the object language; or it may be a secondary source, i.e. a meta-linguistic text (typically, a linguistic publication) which adduces that piece of the object language. This may make a difference for the format of the bibliographical reference; but it does not matter for the conventions of quotation.
The general rules of quotation in academic work apply. This means that the source text is reproduced literally. The general rule that forbids quotations “out of context” is to be heeded, too. Mostly, only a particular linguistic feature of the piece of text is currently of interest. If this feature depends on the context, then this context will either be included in the quotation or it will be described. The latter option will generally be preferred if the conditioning factor is a property of a longer stretch of text. Likewise, it is allowed to omit internal parts of the quotation on condition that the omission be marked by ‘...’.
Things are different for an illustrative example, no matter whether it is a simplified version of an original piece of text. It is not common usage to mark modifications of the original in such an example or to identify the source that it is based on. But it would still be useful, adding a hint like “based on [reference to original work]”.
Literal reproduction of the original includes capitalization and punctuation. Standard orthographies for Latin alphabets have an equivalence rule for initial capitalization and final punctuation (by period, question mark and exclamation mark): A sentence has both; a syntagma below the level of the sentence has neither. The same rule applies to linguistic examples.
In all of the above, quotation from a published source differs partly from representations based on other sources, and a fortiori from non-orthographic representations. On the other hand, if such an orthographic representation is matched by an interlinear morphological gloss, the latter will ignore punctuation marks contained in the original. This is feasible for all punctuation marks except those that must be identical in the original text line and in the gloss. These include ‘-’, ‘=’ and ‘+.’ While the latter two do not normally generate problems, the hyphen does pose one. If the multilineal example is to be processed by a parser, a possible solution is to use different punctuation symbols for the orthographic hyphen and for the hyphen representing a morph boundary; e.g. ‘‐’ (UTF 2010) for the punctuation hyphen and ‘‑’ (UTF 2011) for the morphological boundary. For the human reader, no satisfactory solution is known.
Transcribed written source
The original text may be in a non-alphabetic script, e.g. cuneiform or pre-Columbian Maya, or in a non-Latin alphabetic script, e.g. Arabic or Cyrillic. If the original form is reproduced, the same rules as for quoted material (§2) apply.
The original form may be irrelevant for the quoting context, or the example may be intended for a readership ignorant of the original script, or some linguistic analysis, e.g. an interlinear morphological gloss, is to be applied to it which presupposes a representation in the Latin script. In all such cases, the original text will be transcribed or transliterated (§1, #2) if an orthographic representation is wanted at all. The chief recommendation here is not to invent one's own transcription but to follow a conventional published transcription system whenever possible. Such transcription systems are publicly available for all the major non-Latin scripts.
Non-Latin scripts have different rules for capitalization and punctuation or lack these altogether. If the official transcription guidelines do not mention them, a minimum convention is to capitalize proper names, but not add any punctuation.
Transcribed oral source
The original text may have the physical form of an audio- or video-recording (§1, #3). Its use as a linguistic example presupposes its transcription. If the quoting author is not the author of the transcription, reproducing the transcription is a case of quotation of a written source (§2).
Otherwise, all the types of representing the significans of a text are, in principle, available in such a case. Some require special comment:
A faithful transcription of a recording might use the IPA. As noted initially, rules of standard orthography do not apply then. However, a phonetic representation is not an appropriate basis for a grammatical analysis. It should be coupled with a morphophonemic or, in the last resort, an orthographic representation if grammatical analysis is to be applied.
If the language has an established standard orthography, the least problematic solution from the point of view of the readership is normally to use this for transcription. This solution is usually sufficient whenever the phonetic and phonological levels of linguistic description are not at stake.
The language may lack a standard orthography, or this may be foregone deliberately. Then transcription systems designed for conversation analysis or discourse transcription1 suggest themselves. These have their own rules which typically involve deviations from standard orthography. For instance, upper case may mark emphasis, and ‘...’ may mark a hesitation pause. However, since such transcripts are usually multilineal of themselves, they are not easily combined with representations of other levels in a multilineal representation.
Again, if no standard orthography is used, the fact that the source is an oral rather than a written one may appear in some writing conventions. Two deviations from standard orthography are often found in this case:
- Punctuation is either omitted or limited to symbols designed to indicate prosodic or syntactic properties of the text in question, similarly as in discourse analysis.
- Capitalization is limited to proper names, but omitted at the beginning of sentences.
These devices are not distinctive if the stretch being represented is a syntagma below sentence level, thus not requiring initial capitalization and final punctuation in the first place. It may therefore be better to specify the oral character of the source in the metadata of the example.
1 Relevant examples include Discourse Functional Transcription.