Tuesday, May 21, 2013
Cross of Gold
That would be the Randomized Controlled Trial (RCT), the "gold standard" of evidence for the effectiveness of medical interventions. ("Intervention" is the general term for anything doctors do, be it pills, surgery, recommendations to exercise, shaking a rattle and chanting the name of a benevolent spirit, you name it.)
Ideally, it works like this.
You must specify several conditions ahead of time:
a) Who is eligible to be a subject of the trial. If the intervention is intended to be curative, presumably they must meet certain diagnostic criteria for actually having disease X. You might want to restrict the trial to people in a certain age range. For example you might exclude children for such reasons as their inability to give informed consent and their differential biology from adults, or you might exclude very old people or people with significant co-morbidities because they are unlikely to respond as well and would attenuate any signal you might get. Often you exclude people who don't speak English because you only speak English. And so on.
b) Exactly what will happen to the people in each arm of the trial. This includes not only precisely what intervention, or sham intervention, they will get, but what they will be told, what kind of efforts will be made to insure they will adhere to the protocol (e.g., actually take the pills on schedule), how often they will come in to be studied, whether any effort will be made to restrict anything that might happen to them that could mess up the results (e.g. they get some other intervention outside of the study), you name it.
c) How people will be recruited and enrolled, how they will be tracked, what efforts will be made to retain them in the study.
d) The end points you are hypothesizing. For example, significantly more people in the active intervention arm will meet some criteria for not having the disease 6 months after initiating the treatment; or symptoms will be reduced by some amount according to a carefully specified measure. If you think there will be a difference in response between males and females, old folks and young, people with and without any other characteristic, you must specify in advance. You must also specify what possible adverse events you will test for or assess.
e) The number you will enroll in each arm of the study, how they will be assigned, and how both the subjects and the people involved in the investigation will be blinded as to what treatment each person is getting.
f) The "statistical power" of your study. This means that if there is a real effect of a given size -- something hypothesized to be realistic -- what percentage of the time will a study "detect' the effect with a p value < .05. This is really important and I'm pretty sure most people don't get it.
.So let me try to explain. Almost always, there is a certain amount of random variation in response. Some people just get better on their own. Some people are less responsive to a treatment than are others. Some people, in spite of meeting the diagnostic criteria, didn't actually have the thing in the first place. Whatever. The whole point of randomizing the subjects is that you hope these unmeasured factors will be evenly distributed between the two groups, but in case they aren't, you can use probabilistic reasoning to figure out the probability than an observed effect was just do to chance, versus being real. You need that randomness to compute a p value.
So, we set an arbitrary standard of 5%. If the observed effect would happen fewer than 1 time out of 20 even if there really is no difference between the groups -- the treatment is ineffective -- we call the effect significant. But an effect that is not statistically significant is not the same thing as no effect. A p value of .06 means the thing probably does too work, but you aren't allowed to make that claim. Why? No particular reason.That's just how we do it.
So what can go wrong? Plenty as you might imagine. More anon.