Journal of software incremental updating algorithm
Algorithms for calculating variance play a major role in computational statistics.
A key difficulty in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow when dealing with large values.
can be very similar numbers, cancellation can lead to the precision of the result to be much less than the inherent precision of the floating-point arithmetic used to perform the computation.
Thus this algorithm should not be used in practice.
This is particularly bad if the standard deviation is small relative to the mean.
Techniques such as compensated summation can be used to combat this error to a degree.It is often useful to be able to compute the variance in a single pass, inspecting each value only once; for example, when the data are being collected without enough storage to keep all the values, or when costs of memory access dominate those of computation.For such an online algorithm, a recurrence relation is required between quantities from which the required statistics can be calculated in a numerically stable fashion.The following formulas can be used to update the mean and (estimated) variance of the sequence, for an additional element x These formulas suffer from numerical instability, as we are repeatedly subtracting a small number from a big number which scales with n.A better quantity for updating is the sum of squares of differences from the current mean, This algorithm is much less prone to loss of precision due to catastrophic cancellation, but might not be as efficient because of the division operation inside the loop.
For a particularly robust two-pass algorithm for computing the variance, one can first compute and subtract an estimate of the mean, and then use this algorithm on the residuals.