How to Lie with Statistics
- May 12, 2018
- 4 min read

How I came across this book: Remotely related to my Masters course.
Favorite Line from the book:
Proper treatment will cure a cold in seven days, but left to itself a cold will hang on for a week.
The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify.
Our company made one per cent return on sales; our company made a fifteen per cent return on investment; our company made ten-million-dollar profit; our company had an increase in profits of forty per cent (compared with 1935- 39 average); our company had a decrease in profits of sixty per cent from last year. These are all the same values used in different situations with a different purpose. We do not need to know every detail of statistics but knowing the tricks of the trade will prevent us from getting duped. The book is more than 60 years old, but its relevance is even more so in the current times when data is being collected about everything and anything.
Here are some statements which may or may not be correct (disclaimer – all the examples are mine, but they are inspired by the book).
1. Participation in higher education in England is almost 100% as confirmed by a sample of young students in Chelsea, London.
2. In 2015, the average salary of a doctor in the NHS was 37000 £.
3. In 2015, the basic salary of an NHS doctor ranged from 76,761 – 103,490 £.
4. Drowning of adolescent peaked during the same time an increase in the consumption of ice-creams so ice-cream consumption leads to more drowning.
5. As per Marimer Lopez, my looks are enhanced by an average of 100% (± 1%) when I grow my hair, and it statistically significant (p<0.00001).
Hopefully, we can pick the correct statements by the end of my blog.
A sample has to be truly random to represent the population accurately and avoid any bias. For polling purposes, if we sample people in the office during the daytime, then we may miss out on shift workers or some people working from home. The sample has to be large enough to be statistically significant. The mortality rate for a newly appointed heart surgeon was stated as 50%. It sounds awful, but the poor surgeon had a death during his second case which inflated his mortality rate significantly. If his mortality rate remains 50% after 10 cases, then it is something to be concerned about. The most important concept in statistics is the normal bell curve and averages. A mean is the average of a certain number of variables. Median divides all the variable in half, so 50% are above it, and 50% are below it.

Most things in nature and life are distributed in the pattern of a normal bell curve where mean = median. In a left-skewed curve, median becomes more than mean whereas in a right skewed curve, mean become more than median. Mean should be presented along with standard deviation whereas median should be presented along with the range.

Always be wary of the way graphs are presented. A quick look at this graph may make us think that Rhinoceros population is doing pretty well.

If you look at the next graph, then it sends a shock wave through your spine. So, are Rhinoceros doing well or are they not? Only one can be true! It turns out both graphs are correct. The top graph y-axis cut off is 20,000, and it shows the species variation. The bottom graph y-axis cut off is one million, and the overall number of rhino population has decreased.

Finally, Correlation does not mean causation. If you repeat an experiment multiple times, then it will lead to correlation by chance. In conclusion, it is always important to see who conducted the study, what was their motive, was the sample size big enough, what was the type of average used and is there causation based on correlation without any explanation?
Coming back to our questions.
1. Incorrect.
The sample from Chelsea, London is very biased as it is a very affluent area. The mean participation in higher education for adults below 30 years in England was 48% in 2015 as per www.gov.uk.
2 & 3. Correct.
Both statement 2 & 3 are correct though they seem contradictory. If the average salary is 37000 £ then how can the range be 76,761 – 103,490 £? Statement 2 referred to the mean salary of an NHS doctor at the registrar level in 2015. Statement 3 referred to the salary range of an NHS doctor at the consultant level in 2015. The devil is in the detail.
4. Incorrect.
Drowning rate goes up at the same time as the ice-cream consumption increases, but there is no correlation. The missing link is the summer season. More kids are out in summer swimming, and more people are enjoying their ice-cream at the same time.
5. Correct.
We can all agree that statement five is correct because women are always right and the laws of statistics do not apply when it comes to love.

































Comments