4.4 Statistical power

Another related but crucial consideration for inferential statistics is the concept of statistical power. Without spoiling too much about what it means here, below is an overview of what this concept is and why it is important.

4.4.1 Statistical power

Power in a statistical context essentially describes how likely we are to actually detect an effect given our sample size. Mathematically, power is defined as \(1 - \beta\), which in English terms means that it is the probability of not making a Type II error. Power is expressed as a percentage. For example, if your study has 50% power, it means it has an 50% chance of actually detecting an effect. The most common guideline is to aim for a study with 80% power.

4.4.2 Factors that affect power

The primary factor (within your control) that affects how much statistical power you have to detect an effect is sample size. Think back to the formula for standard error, as a proxy explanation as to why this is the case. Larger samples tighten the sampling distribution of the mean, and so two distributions will overlap less and less the greater the sample sizes are. Therefore, if there is less overlap there is greater space to detect an effect.

Some other factors that can affect power are:

The effect size - how large is the difference between your groups, etc? If you’re trying to detect very small effects, you need much more power to detect it compared to larger ones.
Performing a one-tailed test - because effects are only being tested in one direction, this alters the p-value (it actually halves it; a two-tailed p = .10 is a one-tailed p = .05). Don’t do this though, because there are very few instances in which you can justify using a one-tailed test without reviewers and other clued-in readers suspecting that you’re intentionally fudging your power.
Increase alpha - for a semi-detailed explanation of why, see here. In addition, there is a nifty tool here that lets you see what happens to error rates when you change specific parameters: Link to the tool

The consequence of being underpowered means that you can miss effects that exist. A good proportion of studies in psychology are underpowered, meaning that effects are being missed where they exist. Power is therefore an integral consideration of good study design, particularly for experimental contexts.

4.4.3 Power analyses

Identifying an appropriate level of statistical power is an important part of planning quantitative research. Before conducting a study, it is wise to run a power analysis. Doing a power analysis allows you to identify how many participants you may need in order to reliably detect an effect of a given size.

Most modern statistics software will allow you to conduct power analyses:

SPSS (from version 27)
Jamovi (with the jpower module)
R (with the pwr package, among many others)

We won’t get too into the maths here of how this is done (it heavily depends on your research design, e.g. how many groups you have, what test you plan on doing…). These programs will let you select the appropriate design you want to test, and choose the size of the effect you want and your alpha level. The power analysis will then give you a minimum sample size per group for you to achieve that level of statistical power.

4.4.4 Post-hoc power analyses

You might see authors in some papers where you present a power analysis after collecting your sample and doing your analyses. Supposedly, this is to show that your sample had enough power to detect an effect. However, this is conceptually flawed. The primary flaw is that post-hocs are essentially just restatements of your p-values and so do little to show the true power of a design/test.