A computer file is a unified set[i] of data in some technical (computer-readable) format which[i] has a single name by which the operating system accesses it to do something with it. Its smallest elements are bytes (which, depending on various factors, may be grouped to double or quadruple bytes).
Data files and program files
From the user's point of view, the most important distinction is between
- data files: files that contain data that the user can handle (write and read),
- program files: files that are run by the operating system (‘executable files’) or used by application software and that are not meant to be handled by the user. (From the point of view of the programmer, files read and written by the application software are also data files, although the end user may never see them.)
Naturally, a programmer can handle the second file type, too. In what follows, we concentrate on data files.
A data file consists of the header and the data section.
- The header identifies the file type for the operating system and for other software that manipulates it. Thus, the software knows what the bytes in the data section represent: characters, numbers, pixels etc. The header is not meant to be displayed to the user.
- The data section contains the data (e.g., running text or numbers of a table) in some format.
File types differ in
- the size and kinds of bytes they allow: single vs. double vs. quadruple bytes, ASCII vs. non-ASCII
- the structure of their header
- the structure of their data section.
The data section itself may be subdivided, for instance into one part that contains data to be displayed for the user and one part that formats those data (e.g. in an MS Word 2000 document). Or structural and user data may be interspersed (e.g. in an HTML file). At least the most elementary features of the structure of the data section are coded in the header.
Whenever any file whatsoever is directly displayed on the screen (i.e. foregoing the pertinent application software), its bytes are displayed according to the ANSI code. However, many of the files stored on a computer are not meant to be ever displayed on the screen. For instance, when the user displays an executable file (file extension: .exe) on the screen, he may quickly convince himself that it is not meant to be displayed.
There is one file type that is universally readable, for computers and human beings alike: the ASCII (text) file. Any application software that is at all meant to handle text – in particular, text-processors and database management systems – can import, display, modify and export an ASCII file.
The uses of ASCII files in the computer world are manifold; and some of them are distinguished by their file extension. For instance, in a Windows system (and largely in a Linux system, too), files with the following extensions are regular ASCII files: txt, rtf, html, bat, log, ini.1
Because of their unproblematic nature, ASCII files enjoy the preference of many programmers when universal exchangeability and longevity of files matters. Because of their limitations, they are not as economic as files of a format designed for a specific purpose; but they can be accessed by the user even though the software that produced them may not be available.
The distinction between data files and program files crosscuts with the distinction between ASCII files and non-ASCII files. The following table contains some examples of each category:
|ASCII||TXT, RTF, HTML||BAT, INI|
|non-ASCII||MS Word DOC files, JPEG||executable/binary files (EXE, COM)|
Although ASCII files may be handled by text processors like MS Word that work with non-ASCII files, care must be taken (esp. with program files) because the text processor may write them back in a non-ASCII format.
The files of a relational database – for instance the mdb files produced by MS Access – are not ASCII files. They contain user data interspersed with codes that determine the field structure, the linking of tables and the like. Such a file can therefore be handled only by specialized software.
The files of free field-structure databases may be ASCII files (for instance Shoebox/Toolbox files before the advent of Unicode were). The data file then contains the records of a database, one after another, and no special codes determining any particular database structure. Such a file can be opened and edited with any text processor. If such files are displayed in a user-friendly way, it is because the pertinent application software (e.g. Shoebox/Toolbox) has been programmed to interpret and display the file in that way.
If a database file has ASCII format, the record and field structure itself must be coded in ASCII, too. Free field-structure programs do this by stipulating a character sequence that separates subsequent records in the file (e.g. two times carriage return plus linefeed) and by writing the field names – again categorized as such by a designated character, e.g. ‘\’ – into the records, for instance in front of the field data.
1 The file extension only determines which application software is meant to handle the file. On the one hand, a text processor treats all ASCII files in the same way, no matter what their file extension is. And on the other hand, an application software may be programmed to display an ASCII file in a peculiar way.