Map of life expectancy at birth from Global Education Project.

Saturday, December 11, 2021

A note on how we know what we know and what we don't know

A major problem in public discourse is what I call the reification of data. The information we have about social facts, economics, public health -- the statistics that come out of CDC, the Bureau of Labor Statistics, the Census Bureau, academic research -- are not identical with reality. But the people who bandy them about seldom reflect on where they come from, or even think to ask.

 

Regarding HIV specifically, in the United States we depend on the HIV surveillance system. Like many other infectious diseases, HIV is reportable. When someone tests positive, and gets a positive  confirmation test, the entity that does the test has to report it to the state. They are required to ask the person certain questions: demographic information, zip code of residence, and whether they have engaged in specific risk behaviors for HIV transmission: male-to-male sex, injection drug use, heterosexual sex. Receipt of blood products used to be an issue but it isn't any more. But the system doesn't have any more detail than that. It doesn't tell us anything about the context, for example whether any of this might have happened in prison, whether the person is a sex worker or patronizes sex workers, anything more than those very general categories. It may also be that a person does not tell the truth about one thing or another.

 

This information is reported to the state health department, which produces regular statistical reports, and also passes the information on to CDC, which compiles national statistics. 

 

So, obviously, the first thing that has to happen is that a person gets tested. This doesn't tell us how many people are HIV+, it tells us how many positive tests there have been. Some categories of people, in some places, are more likely to be tested than others. But there's no sure way to understand those patterns because we don't have the ground truth of who is infected. Our information has to come through that initial filter.

 

In order to understand more about how people become infected, one could interview a sample of people who are known to be HIV+. Getting a truly representative sample is very difficult, however. Because health information is confidential, the only way to recruit people is through a health care provider who already has that information about them. But people who get their HIV care through a particular organization are not representative of everyone living with HIV -- some of whom aren't in care at all. It won't work to ask a random sample of the population because the prevalence is too low and most people wouldn't tell you anyway. 

 

You can do in-depth,  qualitative interviews with people who are willing to talk openly, and learn about particular communities or the existence of certain risk behaviors or contexts. This won't give you quantitative understanding of the population as a whole, but it may help design targeted interventions. But the bottom line is that our understanding of any epidemic is limited in some ways. It's important to be clear about what we know, and what may be speculation or tendentious reasoning, perhaps driven in part by biases or faulty assumptions. In this case, we can't really know what the relative contribution to the HIV epidemic may be of incarceration, or sex work, or people under the influence of alcohol or other drugs behaving recklessly, or any number of other factors that we might want to target. We may have some information that helps us understand the nature of these risks, but we can't really quantify them. The same goes for other epidemics.

1 comment:

mojrim said...

I had not previously considered the structural issues around survey data. Point taken.