Tuesday, May 29, 2012
More on evidence
Continuing my latest adventure in wonkery, the quick review of the evidentiary basis of causal inference . . .
The plural of anecdote is "anecdotes," not "data," it is true -- however, anecdotes are data.
But, the term anecdote connotes a story that is only casually observed, perhaps retold. Both single case studies and so-called within subjects design studies can provide useful information and can even support causal inference under very specific circumstances, although it is unlikely to be conclusive; but we must very carefully observe and document what we do. That's why these aren't considered "anecdotes." The reason we often see highly dismissive responses to such studies is that they are often overinterpreted, their limitations insufficiently acknowledged. Indeed they are quite commonly used by charlatans such as homeopaths to promote quackery. But let's hang on to the baby as we pour out the bathwater.
To review, if an outcome is extremely improbable, and we try a novel intervention even once and then observe the extremely improbable outcome, we can reasonably have a strong suspicion that the intervention was indeed related to the outcome. If we have an a priori plausible explanation for how the intervention produced the outcome, so much the better. So, if surviving a fall from thousands of feet is highly improbable, and a person using a parachute survives such a fall unscathed, we don't need to see it work more than once to believe that there's something to this parachute thing.
On the other hand, a single trial would not convince us that parachutes work 100% of the time, or even most of the time. We'd need a lot of experience before we were confident we knew how effective parachutes are, under what circumstances, and what can go wrong. What we would not need, however, is a randomized controlled trial, because we are already highly confident that falling 5,000 feet without a parachute is almost inevitably fatal.
And that's the principle on which a within subjects design can be useful. If we're already confident that a particular outcome in a defined population is improbable given the existing or natural state of affairs, then a before-and-after test of an intervention can give us meaningful information about whether it is likely to be effective. If remission of metastatic cancer is extremely rare, if we give 5 people a novel treatment and 3 of them remit, we don't need a formal control group to believe we're on to something.
This sort of inference depends on the assumption that people are biologically pretty similar. After all, a chemistry experiment doesn't ordinarily need a control group at all because every atom of a given isotope of carbon, in the same state of ionization, is identical. The reason we have much more difficulty making causal inferences in health research is because people are so complex and so variable; because measurement of outcomes is often not straightforward; and because out interventions, unlike mixing chemicals in a beaker, typically have multiple components and multiple effects.
And so, the example with metastatic cancer is quite unusual. If an outcome, unlike remission of metastatic cancer, is not extremely rare, is not completely straightforward to observe, or may respond to multiple components of the intervention such as placebo effect, a within subjects trial is much more problematic. Sure, if the trial results in a rate of outcome which is markedly different from what we would have expected, it can be at the very least suggestive, but there are many pitfalls.
Here are a few:
Selection bias: People are convinced that Alcoholics Anonymous works because alcoholics who regularly attend AA meetings have a higher rate of sustained abstinence than alcoholics who do not. But maybe people who are motivated to remain abstinent are more likely to attend meetings. In fact there is no good evidence to show that AA is effective at all, for basically this reason. Should we really expect the desired outcome to be at the ordinary background rate in the population selected for the trial, or is just selection, rather than the intervention, that produces the observed effect?
History: Before and after designs are often used for interventions that target social problems, perhaps at a community level. But the trouble here is that a whole lot else is going on at the same time, in addition to the intervention. While you're doing outreach education to reduce the risk of STIs in teenagers, a whole lot else may be changing: sexual mores, condom availability, the likelihood of exposure due to other factors such as enhanced availability of treatment, you name it.
Non-specific effects of the intervention: We may attribute the observed outcome to the magic potion we had you ingest, but maybe it was the effect on your expectations, the fact that it made you mildly nauseous so you skipped your usual seven vodka-and-ginger ales, or just the fact that somebody paid attention to you, that made the difference. Notice that this category includes, but is not strictly limited to, what we call placebo effects.
So, uncontrolled, before-and-after trials, can give us some useful information but they can also often be misleading. So-called Phase One trials of drugs are of this nature. A small number of people are given an experimental drug just to see if there are any obvious, immediate ill effects; so we can figure out it's "pharmacokinetics," in other words how much of the stuff gets into the blood stream or target tissues and how long it lasts; and to see if anything else dramatic and exciting happens. If the latter, we don't jump to any conclusions. We still need to go no to controlled trials before the drug can get approved.