What is RSS?

Subscribe to LSS Academy

Click Here to Subscribe to Articles Subscribe By Email Below

Explaining the Central Limit Theorem

by Ron Pereira on July 16th, 2007

If you hate statistics this post is for you. Why? Because it’s my intention to have you understand AND be in position to teach others one of the more complicated and misunderstood statistical concepts of our time - the central limit theorem (CLT) - by the end of this article.  If you are up for the challenge read on.

Central Limit Theorem – What is it?

OK, here is an official like CLT definition for the purists.

The central limit theorem (CLT) states that the means of random samples drawn from any distribution with mean m and variance s2 will have an approximately normal distribution with a mean equal to m and a variance equal to s2 / n.

Say what? I know… the people that write statistical text books need to join the rest of us on the planet earth. It’s like they get a kick out of making people wonder what the hell they are talking about.

All this confusing definition is really saying is that as n, or our sample size, increases just about any distribution (normal or non normal) will tend to behave normally.

How can it be?

The key to this theorem is the whole s2/ n part of the formula. As n, sample size, increases we see s2, the variance, decrease. And less variance means a tighter, more normal, distribution.

Prove it to me

Remember, I told you that you will be able to teach others about this concept. So here are some teacher’s notes.  Around this time in your explanation the student or students will wonder who you are. They may even think you are on crack. That’s good. You have them right where you want as you are slowly setting the hook. Once they bite, and they will, you will then reel them with ease.  Let’s press on.

Time to Simulate

I came across this sweet little Java Applet tool that allows you to perfectly demonstrate the CLT. Just click on the “Start CLT Applet” button to launch the tool. This tool was developed by some folks at Seton Hall and from all I can tell is free for anyone to use.

Fun with the Weibull Distribution

So here is the situation. Let’s assume we have a process that exhibits a Weibull distribution which would fail to pass as “normal” data as it is skewed to the right. This means we can not use any so called “parametric” hypothesis tests. Often times we see a Weibull distribution with reliability/ failure analysis data.

weibull-1.JPGLet’s now pretend we send someone out on 100 different occasions (trials) to collect data from this process we know to exhibit a Weibull distribution. Let’s also assume we tell this person to only collect “1” data point per trip/trial. After the 50th trial we take the 50 data points and study its shape (Top Figure).

The blue bars are our data and the yellow outline is what a typical Weibull distribution looks like. As you can see our distribution looks pretty Weibull-ish.

weibull-5.JPGNow then, we then tell the person to go back 100 more times. Only this time we ask them to collect 5 samples instead of 1 sample each trip out. We will take the 5 data points from each trial and average them together. After the 50th trial we take the 50 data points (remember each data point is an average of 5 numbers) and study its shape (Middle Figure).

Notice how the distribution is beginning to behave a bit more normal like although still maintaining a little Weibullishness? Yes, that’s a word… at least in my dictionary.

Ok, we now attempt to really push our luck as we ask the person to go back out one last time. This time we ask them to collect 25 samples during each trial. We also buy them lunch at this point as we are beginning to whip them pretty good!

weibull-25.JPGSo, they go back out and collect 25 samples per trial. Again, we take these 25 samples and average them like in the second trial when we averaged 5 data points. After the 50th trial we take the 50 data points and study its shape (Bottom Figure).

Now we clearly see a normal, bell shaped, distribution beginning to appear. And all we did was increase the sample size, n, from 1 to 5 and finally to 25.

When you teach people this it’s at this point where you reel them in. Also, turn the simulation tool over to them so they can play around with it. Tell them to prove the theorem wrong if they can. No worries, they can’t.

Break out the dice

Another fun way to demonstrate the CLT is with fair dice. Simply have someone roll 1 die 50 times noting their results after each roll. When they graph this the distribution will be very flat. Then give them 2 die and have them roll them both at the same time 50 times (averaging the results each run). Finally, give them 5 die and repeat. You will see the distribution become more and more normal as the sample size, n, increases.

If you enjoyed this post please subscribe to this blog via RSS feed.

Update: In case you don’t read the comments Rob, from LearnSigma, shared a link to the coolest dice game Applet.  So check it out.  Thanks Rob.


7 comments...What do you think?

  1. Posted by Meikah Delid 16th July, 2007 at 11:35 pm

    Good work, Ron! I understand CLT even if I strain myself most of the time to understand. Your last example finally got to me. :)

  2. Posted by Ron Pereira 17th July, 2007 at 6:04 am

    Thanks Meikah. I find that you must “show” people the theorem in action which is why a simulator is so useful.

    But if you can do the dice that is even better as it is a bit more interactive… and who doesn’t like to throw dice around!

  3. Posted by Rob 17th July, 2007 at 6:46 am

    Ron - good explanation of an often confusing concept.

    Here’s a nice simulation: http://tinyurl.com/23ssq3

  4. Posted by Ron Pereira 17th July, 2007 at 7:10 am

    Excellent tool Rob! Thanks for sharing. I hope everyone uses this dice game.

  5. Posted by George Tete 24th January, 2008 at 6:46 am

    why do you need to sudy and understand the Central Limit Theorem, Exponetial Density Function and Analysis of Variance

  6. Posted by George Tete 24th January, 2008 at 6:52 am

    why do you study and understand the Central Limit Theorem, Exponential Density Function and Anaylsis of Variance

  7. Posted by Ron Pereira 24th January, 2008 at 8:59 am

    Hi George, can you please elaborate more on your question?

What do you think? Join the discussion...