"there are three kinds of lies: lies, damned lies,
and statistics" - Disraeli
Many people working in CSCW have had some formal statistics training
or have used statistics in practice. They may perfom experiments or
gather data themselves, read or review other's work or teach others
about CSCW. This tutorial aims to complement their experience, concentrating
instead on the understanding of randomness and common statitstical concepts.
It's easy to forget just how random the world is. We know that
things won't come out exactly at their average value, but think they won't
be far out. Even with a long standing knowledge of random phenomena, I
still get surprised sometimes at how far from uniform things are. In this
first part of the tutorial, we'll try some experiments to see random phenomena
at - raindrops falling on the plains of Gheisra
and coin tossing races.
We are very good at finding patterns in data ... so good, we even
see patterns when none are there. Often experimental results are misinterpreted
because randomly occurring patterns are regarded as indications of real
Over uniform results have usually not occurred by chance, but instead
because of some systematic effect or human intervention. Statisticians
have re-analysed Mendel's results which established genetic inheritance
and also the experiment which established the fixed electron charge.
In both cases the results were too good to be true. A systematic process
had been at work - the experimenters had discarded those results which
disagreed with their hypothesis. In fact, the results they discarded
would have been simply the results of randomness making some experiments
run counter to the general trend. This is quite normal and to be expected.
So, don't try to fiddle your results - you will be found out!
pdf slides (128K)
finding things out
Randomness causes us two problems. First, as we discussed in
the last section, we may see patterns that aren't there. But also, if
there are patterns in the world, we may fail to see them. The job of statistics
is to help us see through this randomness to the patterns that are really
in the world
The primary way this is done in statistics is to use large numbers
of things (or experimental trials) and different forms of averaging.
As one deals with more and more items the randomness of each one tends
to cancel out with the randomness of others, thus reducing the variability
of the average.
The advantages of averaging are based on the cancelling out of randomness,
but this only works if each thing is independent of the others. This
condition of independence is central to much of statistics and we'll
see what happens when it is violated in different ways.
Averaging on its own is not sufficient. We have to know what is the
right kind of averaging for a particular problem. Then, having found
patterns in the data, we need to be sure whether these are patterns
in the real world, or simply the results of random occurrence. We'll
look at these issues in more details in the rest of the tutorial.
pdf slides (129K)
measures of average and variation
Even for straightforward data there are several different common
forms of averaging used. Indeed, when you read the word 'average' in a
newspaper (e.g. average income) this is as likely to refer to the median
of the data as the mean. Actually, the reason the arithmetic mean is so
heavily used is due as much to its theoretical and practical tractability
as its felicity!
Statistics does a strange sort of backwards reasoning, we look at
data derived from the real world, then try and extract patterns from
the data in order to work out what the real world is like. In the case
of means, we hope that the mean of our collected data is sufficiently
close to the 'real' mean to be a useful estimate. We know that bigger
samples tend to give better estimates, but in what sense 'better'.
In statistics, better usually means 'less variation'. Again there is
no single best measure 'variation', but the most common solutions are
the inter-quartile range, the variance and standard deviation ().
Of these the first is useful, but not very tractable, the second is
very tractable (you can basically add up variances), but is hard to
interpret, and the last both reasonably tractable and reasonably comprehensible
- that's why is
We'll have a look at the square root rule for how averages get 'better'
as estimates and also at the problem of how to estimate variation -
a different sort of averaging.
pdf slides (115K)
Many reported experiments in HCI and CSCW journals (and other
disciplines) end up with a statistical significance test at 5%: if it
passes the result is 'proved', if it fails - well ... we'll come back
Proof in statistics is about induction: reasoning from effects back
to causes. In logic this is the source of many fallacies, but is essential
in real life. The best one can say in a statistical proof is that what
we see is unlikely to have happened by chance. Although you can never
be entirely certain of anything, you can at least know how likely you
are to be wrong. A 5% significance means that you are wrong one time
in twenty - good enough?
Significance tests only tell you whether things are different. They
don't tell you whether the difference is important or not. Some experiments
may reveal a very slight difference others may have such high variability
that even a huge difference would not be statistically significant.
Understanding the relationship between variability, real underlying
differences and statistical significance is crucial to both understanding
and designing experiments.
Disturbingly often academic papers in HCI and CSCW use a lack of significance
to imply that there is no underlying effect. In fact, you can never
(statistically) prove that things are identical. Statistical insignificance
does NOT prove equality!! The proper way to deal with equality is a
confidence interval which puts bounds on how different things are.
pdf slides (143K)
design and test
There are two enemies to statistical proof. First is variability
- the results may be lost in the randomness. The second is aliasing -
the results you measure (and check to be statistically significant) are
actually due to some other cause.
The first problem, variability, is to some extent intrinsic and in
the final analysis can only be dealt with by increasing the number and
size of experiments as we have discussed in previous parts. However,
some of the variability may be due to factors not intrinsic to the thing
being measured: in HCI and CSCW experiments principally differences
between people. If such factors are randomly allocated, they may not
affect the overall result, but will certainly increase the effective
The second problem, aliasing, is even worse. These additional factors
may give rise to spurious results if, for example, all the most expert
users try out one design of software.
Careful experimental design can fix, randomise or cancel out these
additional factors. Hence reducing the likelihood of aliasing and making
it more likely that real differences will show up as statistically significant.
Finally, having run an experiment, if you then use the wrong statistical
test, then at best real differences may be missed or at worst apparently
significant results may in fact be spurious.
pdf slides (57K)
experiments in HCI and CSCW
Experiments in HCI and CSCW involve that most variable of all
phenomena, people. The great danger in any experiment in HCI and CSCW
is that the results are analysed at the end and no statistical conclusion
can be drawn. We'll discuss how to avoid this disaster situation. This
influences the choice of what to measure as well as the way experiments
and constructed and analysed. Furthermore, it is often impossible to use
sufficient subjects to obtain statistical results. We'll discuss how to
combine different kinds of experimental data - quantitative, qualitative
and anecdotal - in order to make sure that even small experiments have
a useful (and publishable!) output.
pdf slides (27K)