Mark Twain, (and possibly Benjamin Disraeli in England) famously stated “There are three kinds of lies: lies, damned lies, and statistics.”

 

I just finished a 9 week MOOC – massive open online course in Medical Statistics offered by Stanford Medical School. The course was free and I highly recommend similar free online classes to all medical professionals. I must confess, though, I did not do the homework or take the final exam.

 

Finishing the course reminded me that statistics can be manipulated by the press, skilled lawyers, and others – often without malicious intent- to achieve a particular purpose. If an expert is delivering drivel against you on the witness stand, and he is waxing philosophic about statistics, the more you know about statistics, the better you’ll be able to defend against its misuse. While we could spend all day on the topic, let’s stick to basics.

Rule #1: Correlation is not causation. This is one of my favorites. Correlation of a country’s per capital chocolate consumption and number of Nobel Laureates per 10 Million population1. Look below:

chocolate

Observe that Switzerland has the lion’s share of population-adjusted chocolate eaters and Nobel Laureates.  Imagine getting the best of all worlds. Plus, the association is statistically significant to boot – P<0.0001 – meaning that the association has a 1/10,000 likelihood of being attributed to chance.

 

But, not every significant correlation implies causation. If A is associated with B: A may cause B, B may cause A, or there may be no causal relationship whatsoever. It’s also been noted that hemlines rise during a rising stock market; falling during a more dismal stock market. Probably not a cause-effect relationship either, eh?

 

Rule 2: Relative risk is the not the same as absolute risk. Say a scholarly paper asserts the relative risk of contracting some ailment if exposed to some agent is three times what it would be compared to those not exposed. Three times. That’s a lot.

 

But, what if the baseline incidence of the ailment is only 1/10,000 people? According to the paper, your absolute risk after exposure to the agent is 3/10,000. The agent’s not so scary anymore.

 

In 2002, the Women’s Health Initiative trial was abruptly halted. The study was analyzing whether specific hormones (oral estrogen and progestin) caused more benefit or harm for post-menopausal women. In the large double-blind, randomized study, they concluded hormones significantly increased the risk of breast cancer and heart disease compared with placebo.This finding was surprising – because many assumed that heart disease risk would actually decrease.

 

The increased annual relative risk of breast cancer for those taking hormones was 27%.

 

But, the absolute risk tells a different tale. For example, in the hormone group, researchers followed 8,506 women for an average of ~5 years and a total of 44,061 women-years; 166 of them developed invasive breast cancer, which yielded an incidence rate of 166/44,061 or 38 cases per 10,000 women-years. In the placebo group, there were 124 cases of breast cancer in 41,320 women-years, which yielded an incidence rate of 30 per 10,000 women-years. Thus, in absolute terms, the increase in risk for the hormone group was 8 new cases per 10,000 women-years.

 

The same types of numbers held true for heart disease. A significant increase in relative risk does not equate to a giant jump in absolute risk. Only the absolute risk reveals the more detailed balance; the relative risks convey no information about the relative underlying frequency of different diseases.

 

Rule #3: How prevalent a disease is matters a lot. The predictive value of a test may be counterintuitive. Our brains might tell us that a HIV test or triggered burglar alarm is likely to be right, but, in reality that positive is wrong most of the time – and whether it is right or wrong depends upon the baseline incidence of the condition the test is measuring.

 

We use sensitivity and specificity to guide us.

 

Sensitivity tells us how good the test is at finding something IF it’s there. If a test is 99% sensitive, of 100 people who actually have the disease, the test should find 99 of them.

 

Specificity measures how good the test is against false positives. In other words, how likely is a positive test the same as a true positive.

 

The base frequency of the tested condition determines how you can determine the results. For rare conditions 99% specificity and 99% sensitivity do not mean that a positive test is 99% predictive. If a test is run on a low-incidence population the results might be different than a high-incidence population. That’s Bayes’ theorem at work.

 

A great example from Wikipedia:

 

Imagine running a HIV test on population A which has a baseline of 2% infected. The hypothetical test has a false positive rate of .0004 (0.04%) and no false negative rate. Of 1,000,000 people tested in that population:

 

Unhealthy and test indicates disease (true positive)

 

1,000,000 x 2% = 20,000 people receive true positive

 

Healthy and test indicates disease (false positive)

 

1,000,000 x 98% x .0004= 392 people receive false positive

 

The remaining 979,608 tests are correctly labeled negative.

 

In population A, a person having positive HIV test could be 98% confident (20,000/20,392) a positive test correctly indicates infection.

 

Now consider the test in a low-incidence population. The same test is applied to a population where only 1 person in 10,000 is infected (0.01%). Of 1,000,000 people tested in that population.

 

Unhealthy and test indicates disease (true positive)

 

1,000,000 x 0.01% = 100 people receive true positive

 

Healthy and test indicates disease (false positive)

 

1,000,000 x 99.99% x .0004= 400 people receive false positive

 

The remaining 999,500 tests are correctly negative.

 

In population B, a person having positive HIV test could be 20% confident (100/500) a positive test correctly indicates infection.

 

So, the underlying frequency of the condition being tested explains how likely a positive test equals a true positive. Back to burglar alarms. A loud alarm in a high-crime neighborhood is more likely to be a real break-in compared to a rural area with little reported crime.

 

Summary: Statistics is not easy. But, the more one knows, the better one can interpret the significance of journal articles, media reports, and expert witness testimony.

 

[1] Chocolate Consumption, Cognitive Function, and Nobel Laureates Franz H. Messerli, M.D. N Engl J Med 2012; 367:1562-1564 October 18, 2012DOI: 10.1056/NEJMon1211064