Map of life expectancy at birth from Global Education Project.

## Thursday, July 21, 2005

### The significance of significance

I remain defiantly in the IDon'tKnowWhatI'mTalkingAbout zone, still not having read the NPR report discussed below (but C.Corax has what appears to be a pretty thorough summary, please see the comments). But let's presume the reporter really is saying that the observed decline in the sex ratio at birth is going to cause a major social upheaval. Manufacturers of toy trucks go bankrupt! Doll manufacturing stock soars! Sell Blue! Buy Pink!

I really don't think so. Compared with 1974, out of every 1,000 births, there are now about three more girls and three fewer boys. Nobody would even have noticed this, if we didn't register every live birth (supposedly) in the United States, and there weren't federal employees being paid to count them all.

But it's statistically significant!

Yes it is. That means, precisely, that it is unlikely to be due just to chance. It doesn't mean it's big enough to matter. How do we decide that an observed difference or association is statistically significant? Duck and cover! Here comes math! But don't worry. You don't actually have to remember all the details of this, you just need to get the general idea.

Everybody knows that if you flip a coin, it will come up heads half the time. Well, no, it won't. If you flip a coin twice, it will come up heads twice 1/4 of the time; it will come up tails twice 1/4 of the time; and it will come up heads once and tails once half the time. Here's what happens when you flip a coin four times: If you flip a coin an infinite number of times, those bars smooth out and you get a curve that looks like this. (The bars in this image represent typical real data, which often is similar to the curve but doesn't follow it exactly.) This is called the "normal curve." The number in the middle, at the highest point, is the mean (average) of all the values on the curve. For example, going back to the four coin flip graph, let's say we call tails zero and heads one. Then the values of the bars are 0 (four tails, on the left); 1 (3 tails, 1 heads); 2; 3; and 4. 0+1+2+3+4=10, divided by five different values=2, which is the value of 2 heads and 2 tails, the middle bar. Tah dah! (Each of these values is traditionally called a value of "X").

It turns out that if you take the difference between each value of X and the mean, add all those up, divide by the number of values, and take the square root of the whole thing, you get a number called the standard deviation (sd). In a normal distribution, about two thirds (.6826) of all values are within one sd of the mean; about 95% (.9545) are within two sd of the mean; more than 99% (.9973) are within three sd of the mean. This is true of all normal distributions.

So, there's one more thing you need to know. If you take random samples from some population, for example if you call people at random and ask them their height, the sample means won't be exactly the same as the true mean of the population, but they'll tend to be close. Specifically, they'll be normally distributed, with the mean of all samples the same as the true population mean. The standard deviation of the sample mean is called the Standard Error.

If I take two samples, and their means are more than two standard errors apart, it's quite unlikely -- less than a 5% chance -- that they really are from similar populations. When the sample size "n" is larger, the standard error tends to be smaller. Of course, it's not that simple because I don't actually know the population mean, which I need to calculate the standard error, so I have to estimate it from the samples. That makes the test a little weaker, but I can still calculate how accurate my estimate of the standard error is likely to be, and come up with a probability that my underlying populations really are the same even though my sample means are different. That probability is called p.

So, we can think of each year of birth records as a kind of sample from all of the years in which women have been giving birth in the U.S. What is the probability that the tiny difference between 1974 and 2002 in the sex ratio is due to chance? It happens to be very small, but that's only because there are a helluva lot of births every year. Remember that standard errors go down as numbers go up (because, in the formula, we divide by n, the number of cases). Arbitrarily, we say that a difference is "statistically significant" when p is less than 5%. When n is large, a small difference can be "significant," but it might be so small that nobody could possibly care about it. When n is small, a large difference might not be "statistically significant," but that doesn't mean it isn't real -- it just means our sample was too small for us to prove it to an arbitrary standard of probability. It might still matter a lot.

Soon, I will once again be entertaining.

Edited 7/22 for clarity