3.5 Central Limit Theorem
In Module 6, we will cover the foundations of statistical tests. However, in order to understand what those tests tell us and how useful they are, it is important to basically look at what allows them to work in the first place. In comes the Central Limit Theorem, one of the most important concepts in all of statistics.
You’ll want to keep the concept of the sampling distribution of the mean fresh in mind for this page, as it all relates to that!
3.5.1 What is the Central Limit Theorem?
The Central Limit Theorem (CLT) is a fundamental theorem of probability theory. It states that under the right conditions, the sampling distribution of the mean will converge to a normal distribution. This occurs even when the original data are not normally distributed.
Why is this important? After all, it’s not like this is something we immediately see in action most of the time. Put simply, without the CLT we would not be able to do any of the statistics we do. The CLT allows us to make statistical inferences even when we don’t know the true nature (i.e. distribution) of our data by using the normal distribution to test hypotheses. The fact that it applies even when we have skewed or non-normal data means that we can still make valuable inferences in these scenarios as well. Therefore, it is critical ‘under the hood’ to all of the statistical tests we run.
3.5.2 Simulation
To test this for yourself, try the below sample simulator. You can set what distribution you want to draw from, and choose how many samples and simulations you want to run.
The Population Distribution tab will show you what you are sampling from; the Samples tab is each individual sample and the Sampling Distribution tab shows the distribution of sample means. Try and change the sample size and see how that impacts on the Sampling Distribution.
(You may need to scroll within the app to see the full output.)
3.5.3 Some important points
A general rule of thumb for sample sizes is that n > 30 is sufficient even when the population is skewed. In other words, even if a population is heavily skewed on a variable, taking several samples of n > 30 will still show a normally distributed set of sample means. You can see this for yourself in the simulator above - try set sample size to 5, 10 and then 30, and see what happens in the Sampling Distribution tab.
In addition, remember that with bigger samples, the variability in sample means (i.e. standard error) decreases - and therefore, the sample mean gets closer to the population mean. This means that with large samples, we should ideally be getting a really good estimate of the population of interest! Conversely, smaller sample sizes (as is common in music research) are unlikely to be good estimates of populations.
We will touch on the issue of sample size more in Module 6 but this should already also give you an indication of one of the most important things when it comes to statistical tests: sample size matters.