Map of life expectancy at birth from Global Education Project.

Monday, December 13, 2010

The slipperiness of truth

For much of the 20th Century, as scientific discovery and technological innovation were the dominant drivers of social change and major intellectual preoccupations, the generally accepted philosophy of science was positivism. I'm not going to go into any detail on that here (but you can read some of my past discussion here and in the links therein embedded). I just want to make a simple point: the central idea of the verifiability of scientific assertions runs into trouble when you get to probabilistic statements -- which, as it turns out, probably most scientific assertions happen to be.

Scientific inquiry relies heavily on statistics, notably so in epidemiological and clinical research. But think about it. When the weatherbot on your local teevee says "There is a 50% chance of rain tomorrow," how do you decide whether that assertion is true? Whether or not it rains tomorrow is no help, obviously.

Even if you follow the weather forecasts and the actual weather for a long time, you can't quite tell at what point you start to believe that the forecasters are making accurate predictions. If it rains 52% of the time they say there's a 50% chance, they haven't been exactly right but maybe that's just bad luck and it will even out the next few times.

The squishiness of probability creates a big fat opening for bias -- unconscious or deliberate -- in the presentation of clinical trial data. For the story I am about to summarize I rely heavily on Melanie Newman in the Dec. 11 BMJ, which sadly I cannot link to as it is behind the subscription wall, but here are the first few paragraphs.

The key question is a debate over the circumstances under which published research should be officially "retracted," which means in essence that the journal declares the paper no longer to exist as peer reviewed science, and it centers on a single paper published in 2001 in the Journal of the American Academy of Child and Adolescent Psychiatry, reporting a clinical trial sponsored by the company which is today Glaxo Smith-Kline of its anti-depressant paroxetine. The paper claimed the drug was effective for treating depression in adolescents.

The problem is that based on the 8 outcomes specified in advance, it was not more effective than placebo. These included 2 so-called "primary outcomes," which were the change in the average score on a questionnaire called the Hamilton Rating Scale, and the proportion of subjects who met a threshold for improvement after 8 weeks. So what did the investigators do?

They cooked up 19 new secondary outcomes, and found a positive response on four of them, one of which they then proclaimed, ex post facto, to be "primary." The manuscript was actually drafted by a company contractor, after an internal memo declared that it would be "commercially unacceptable to include a statement that efficacy had not been demonstrated."

Although all this is now publicly known, the journal refuses to retract the article because they claim it does not include any false results. Whether this assertion is itself true is essentially imponderable. Yes, all the numbers shown in the tables are accurate, and so are the p values based on the calculations that were actually done. But to say that the p values are accurate is to fundamentally misunderstand the concept of statistical significance.

It is supposed to represent the probability that an observed difference could have arisen purely by chance, without representing a real association in the total population. But that only works if I make a single comparison (assuming I don't have any information about prior probability -- which gets us to Bayes theorem, which is really where we always ought to begin anyway, instead of with p values. But I digress.) If I make 21 comparisons, as in this case, chances are I'll hit on some ostensibly "significant" p values (which is an arbitrary standard to begin with) purely by chance. The values aren't really significant after all.

So the numbers in the paper may be true in some sense; but the conclusions are false. Their meaning has been fundamentally misrepresented. I don't know if that means the paper should be retracted, however -- if we start going down that road we're going to have a massive bonfire.


roger said...

how does the Journal of the American Academy of Child and Adolescent Psychiatry make money? would retracting this paper reduce their income? just askin'.

Cervantes said...

No, I don't think it would affect their finances. There is almost no market for paid reprints.

It's more a matter of bruised egos -- not wanting to admit they might have published something in error. Fraudulent research is not readily detectable by reviewers, but the reviewers in fact did catch that this was not a credible analysis, and the journal published it anyway. They just don't want to fess up, I think.