Map of life expectancy at birth from Global Education Project.

Friday, April 23, 2010

So what exactly is normal?

If you're a curve, that is. The world famous normal curve is:

• A continuous, symmetrical curve w/ both tails extending to infinity.

• The mean, median and mode -- which I will now explain are the same.

• Described by two parameters (i.e., numbers that can vary): arithmetic mean, and standard deviation -- which I will also explain.

As you may recall, we got our normal curve from a binomial distribution -- heads and tails. So what does it mean, to talk about the mean? (Or to miss New Orleans?) Easy. Just assign a numerical value to heads and tails. Usually in the case of a binomial people use 1 and 0. So the value of the most common result -- half heads and half tails -- is going to be .5 = 1/2. It should be easy to see why. Let's say we flipped our coin 100 times. The most frequent result, which is called the mode or modal value, is going to be 50 heads and 50 tails. 50 * 1 = 50, and 50 * 0 = 0, and 50 = 0 = 50. So the total numerical score for that result is 50. The arithmetic mean, or average, is the total numerical value divided by the number of cases, i.e. coin flips. 50/100 = .5. You can see that all the values on each side of the curve have to average out to .5 as well because they cancel each other out. 49 heads and 51 tails cancels out 51 heads and 49 tails, and so on. Therefore .5 is both the mean and the mode of our curve.

The median is the number in the middle: half the values are lower, and half the values are higher. Basically, you can see that the median is also .5 for the same reason the mean is .5: the curve is symmetrical. Half the values are to the left, and lower than .5; and half the values are to the right, and higher than .5. So there you go. To review:

• The mean is the sum of all values, divided by N (N is the size of the population).

• The median is that value which divides the population in half – 50% are greater, 50% are less.

• The mode is the most common value.

A normal curve doesn't have to describe a binomial distribution. It can also describe a distribution of some continuous value with a real numerical meaning, such as people's height or weight. Such real world values wouldn't often be exactly normal but let's pretend they are for the sake of argument. (We'll get to why this is worth doing later.) Then the mean, median and mode, while still the same as each other, would be whatever value happened to pertain in reality, such as 5'7" and 165 pounds.

Enough of this for now. Any questions will gladly be accepted.

C. Corax said...

Stupid question. I just want to clarify: Are you saying that in a normal curve, the mean, median and mode are always the same? That's how I read what you wrote, so please correct me if I'm wrong.

Cervantes said...

Exactly. If the curve is not normal, those parameters are not necessarily the same.

Anonymous said...

These are good explanations, C, maybe you could gather together your posts about math n stat, in one folder or sumptin, I’m sure that would be useful for many ppl?

One of the BIG problems of statistics is that analysts use and abuse measures and stats developed for the ‘normal’ curve, applying them when not appropriate.

Ana

Anonymous said...

These are good explanations, C, maybe you could gather together your posts about math n stat, in one folder or sumptin, I’m sure that would be useful for many ppl?

One of the BIG problems of statistics is that analysts use and abuse measures and stats developed for the ‘normal’ curve, applying them when not appropriate.

Ana

Anonymous said...

apologies for the double post!

in Switz. we now get all Google stuff and much else in German, automatically.

Ppl are pissed, as more than half of the country does not speak German... even if they learnt it in school.

Imagine being in the US and all messages from foreign providers - or from blogs etc. were now in standard Swedish or Russian...

Ana