B.1 The tidy() function | Research Process for Music Psychologists

B.1 The `tidy()` function

You may have noticed that many of the outputs from common R functions, like lm() and aov(), print their results in a certain format - namely, it essentially prints as text. The tidy() function will simply turn the core output into a data frame. This is useful when you are running multiple models at once, or if for some reason you want to work with the values in model outputs directly. This function will work with just about every standard test function you get in R.

Here an example with the marketing dataset, which contains continuous variables. Let’s fit a multiple regression using lm(), and call summary() on the results:

data(marketing)

marketing_lm <- lm(sales ~ youtube + facebook + newspaper, data = marketing)
summary(marketing_lm)

## 
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5932  -1.0690   0.2902   1.4272   3.3951 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.526667   0.374290   9.422   <2e-16 ***
## youtube      0.045765   0.001395  32.809   <2e-16 ***
## facebook     0.188530   0.008611  21.893   <2e-16 ***
## newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.023 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16

tidy() works directly on model objects, not raw data, so we use the tidy() function on our regression model. As you can see, the data is now in data frame format:

tidy(marketing_lm)

For lm() objects, you can return a confidence interval on the regression coefficients:

tidy(marketing_lm, conf.int = TRUE, conf.level = 0.95)

B.1.1 Correlations

For a correlation:

marketing_cor <- cor.test(marketing$youtube, marketing$facebook)
marketing_cor

## 
##  Pearson's product-moment correlation
## 
## data:  marketing$youtube and marketing$facebook
## t = 0.77239, df = 198, p-value = 0.4408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08457548  0.19208899
## sample estimates:
##        cor 
## 0.05480866

tidy(marketing_cor)

B.1.2 Chi-squares

A chi-square test object:

data("properties")

properties_table <- table(properties$property_type, properties$buyer_type)
properties_chisq <- chisq.test(properties_table, correct = FALSE)
properties_chisq

## 
##  Pearson's Chi-squared test
## 
## data:  properties_table
## X-squared = 82.504, df = 9, p-value = 5.134e-14

tidy(properties_chisq)

B.1.3 t-tests

For a t-test object (applies to all t-tests):

data("genderweight")

weight_t <- t.test(weight ~ group, data = genderweight)
weight_t

## 
##  Welch Two Sample t-test
## 
## data:  weight by group
## t = -20.791, df = 26.872, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
##  -24.53135 -20.12353
## sample estimates:
## mean in group F mean in group M 
##        63.49867        85.82612

tidy(weight_t)

B.1.4 ANOVA objects

For a regular aov() object, you can optionally ask for the intercept term using intercept = TRUE:

data("stress")

stress_aov <- aov(score ~ treatment * exercise, data = stress)
summary(stress_aov)

##                    Df Sum Sq Mean Sq F value   Pr(>F)    
## treatment           1  351.4   351.4  12.295 0.000923 ***
## exercise            2 1776.3   888.1  31.076 1.04e-09 ***
## treatment:exercise  2  217.3   108.7   3.802 0.028522 *  
## Residuals          54 1543.3    28.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

tidy(stress_aov)

Objects fitted by the Anova package also work, but repeated measures designs do not work. It’s best to stick to rstatix if you want a repeated measures ANOVA in dataframe format.

TukeyHSD() can also be used:

TukeyHSD(stress_aov) %>%
  tidy()

emmeans objects can also be tidied, e.g. for simple effects tests:

# Simple effects of exercise for every treatment

emmeans(stress_aov, ~ exercise, by = "treatment") %>%
  pairs() %>%
  tidy()