Assume a set of n
numerical values xi
such that their mean is x̄
. Now each of the values xi
differs from x̄
by a value di = x̄ - xi
, which is this value's deviation from the mean. For an example, let M = { 3.5, 4.0, 4.5 }
and N = { 2.0, 4.0, 6.0 }
. As is easily seen, the mean of both M
and N
is 4.0. However, the two distributions are very different, since in M
all the values are close to the mean, i.e they are not widely dispersed, whereas in N
the values are dispersed at a greater distance from the mean. Intuitively, this average distance is 0.5 in M
, but 2.0 in N
. Thus, the spread of the distribution in N
is wider.
The standard deviation (German Standardabweichung) is a measure of this dispersion of the values around the mean. The standard deviation of a set is a kind of mean of all the di
. In characterizing the set statistically, it gives a hint at the homogeneity or heterogeneity of the set. Here we will continue to assume that the set is a sample, not the entire population.
Intuitively, the standard deviation is the mean of the deviations. If it were just calculated the same way as the arithmetic mean, it would be based on the sum of the absolute values of the deviations, divided by n
. From this straightforward intuition, the actual calculus of standard deviation departs in two respects:
n
, but n-1
.2The formula for the (sample) standard deviation s
1 is, thus, the following:
s | = | √ | ∑ (xi - x̄)2 |
n-1 |
Before the last step of drawing the square root, one has s2, which is called the (sample) variance or average squared deviation (German mittlere quadratische Abweichung) from the mean.
Remember that for our example set { 2.0, 4.0, 4.0, 5.0 }
, n
is 4
and the arithmetic mean x̄
is 3.75
. For this set, the standard deviation is:
s | = | √ | (2.0-3.75)2 + (4.0-3.75)2 + (4.0-3.75)2 + (5.0-3.75)2 |
3 |
= | √ | -1.752 + 0.252 + 0.252 + 1.252 | |
3 |
= | √ | 3.0625 + 0.0625 + 0.0625 + 1.5625 | |
3 |
= | √ | 4.75 | |
3 |
= | √1.583̄ | = | 1.2583 |
The minimum statistical characterization of the sample { 2.0, 4.0, 4.0, 5.0 }
is therefore: x̄=3.75, s=1.2583
.
There are simpler ways of calculating the standard deviation, based on the equivalences:
∑(xi - x̄)2
= ∑(xi2) - n·x̄2
= ∑(xi2) - (∑xi)2 / n
Consequently, the first alternative version of the formula for standard deviation is:
s | = | √ | ∑(xi2) - n·x̄2 |
n-1 |
Using this formula, the calculation of the standard deviation of our sample takes the following form:
s | = | √ | (2.02 + 4.02 + 4.02 + 5.02) - 4·3.752 |
3 |
= | √ | (4.0 + 16.0 + 16.0 + 25.0) - 4·14.0625 | |
3 |
= | √ | 61 - 56.25 | |
3 |
= | √ | 4.75 | |
3 |
which gives the same result as before.
The second alternative version of the formula for standard deviation is based on the second equivalence above, thus:
s | = | √ | ∑(xi2) - (∑xi)2 / n |
n-1 |
Using this formula, the calculation of the standard deviation of our sample takes the following form:
s | = | √ | (2.02 + 4.02 + 4.02 + 5.02) - (2.0 + 4.0 + 4.0 + 5.0)2 / 4 |
3 |
= | √ | (4.0 + 16.0 + 16.0 + 25.0) - 225 / 4 | |
3 |
= | √ | 61 - 56.25 | |
3 |
which again gives the same result as before.
This calculus of the standard distribution canonically accompanies the arithmetic mean. It is used whenever a distribution is approximately normal. For distributions that feature extreme values and for skewed distributions, the mean is already calculated differently (as the geometric or harmonic mean); and then the deviation is also calculated in different ways.
1 Alternatively, the symbol σ
is used; but this should probably be reserved for the estimated standard deviation.
2 n-1
are the degrees of freedom.