Summary: When network delay is reported as a
single number, a common approach is to report the arithmetic average
(mean) of observed delay singleton measurements. This practice yields
a poor metric. Median is superior to mean in every way save
simplicity of computation.
The delay (one-way or round-trip) of a single packet is a singleton
measurement of delay of a network. When network delay is reported,
singleton is usually not the value to report; instead, some number of
packets is sent, and some function of their delays is reported. Such
a set of delays is called a sample. The function yields some
empirical characteristic of the sample. If the sample is thought to
come from an underlying random distribution, the function can be a
parameter estimator for the distribution.
A common function used in this context is arithmetic average. It
has several problems:
Treatment of lost packets. Some packets that were sent might not
be received within the period of time the receiver is willing to wait
for them. These packets are called lost. It is important to realize
that the only thing observed about such lost packets is that they
didn't make it within the timeout period. For all we know, they might
arrive later. What is the delay of the lost packet? We can only say
that it is greater than the timeout. (It might be infinite or
finite.) The treatment the mean estimator must give to these packets
is to simply omit (!) them from the sample. Why does it omit the
packets? Because it's the right thing to do? Of course not---it biases
the result in a timeout-dependent way, which can't be desirable. It
omits the packets because, within the arithmetic average framework,
there's no better answer.
Wrong answer. Reporting ``mean delay'' implies an assumption of a
distribution from which delay values are drawn and reporting the mean
of such distribution. Needless to say, arithmetic average of values
in the sample below a certain value is not a consistent, or even
asymptotically consistent, estimator of the mean. This can be
insignificant for certain classes of distributions if the cutoff value
is chosen so that it's large enough. However, for other classes of
distributions, e.g., heavy-tailed distributions, any cutoff value is
bad and extremely distorting; with a heavy-tailed distribution, for
any given cutoff, the bulk of the contribution to the estimate of the
mean will come from values above the cutoff. It is unknown whether
network delay has a heavy tail, but using an estimator that is known
to give the wrong answer for some distribution, when a much better
choice is available is folly. Note that this problem stems from the
first one.
Lack of robustness. An even more general property is robustness
of the estimator. Robustness, in statistics, is a general property of
analysis: robust analysis does not change significantly when the data
does not change significantly. One property of robust estimators is
that when only a small fraction of values in a sample is changed, the
estimator does not change significantly. Arithmetic average is not a
robust estimator. For any given sample and any given value of the
estimate, changing just a single given value in the sample is enough
to make the arithmetic average equal to the desired estimate. The
problem becomes especially acute when the tail of the distribution is
relatively heavy (as it appears to often be with network delay).
A better statistic than mean is the median. It lacks any of the
problems above, without contributing any new problems, other than
relative difficulty of computation (arithmetic average computation is
trivial). It must be noted here that it is possible to compute
the median approximately in O(n) time (with a single pass over the
data) and O(1) space. The amount of space is traded against
precision of the computation and is not large at all in practical
terms. The code in thrulay (currently in
src/thrulayd.c, search for ``Quantiles'') provides a free
implementation of this algorithm with a BSD-like license (so it can be
used in closed-source commercial products, too).