B.1 The tidy() function
You may have noticed that many of the outputs from common R functions, like lm() and aov(), print their results in a certain format - namely, it essentially prints as text. The tidy() function will simply turn the core output into a data frame. This is useful when you are running multiple models at once, or if for some reason you want to work with the values in model outputs directly. This function will work with just about every standard test function you get in R.
Here an example with the marketing dataset, which contains continuous variables. Let’s fit a multiple regression using lm(), and call summary() on the results:
data(marketing)
marketing_lm <- lm(sales ~ youtube + facebook + newspaper, data = marketing)
summary(marketing_lm)##
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.5932 -1.0690 0.2902 1.4272 3.3951
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.526667 0.374290 9.422 <2e-16 ***
## youtube 0.045765 0.001395 32.809 <2e-16 ***
## facebook 0.188530 0.008611 21.893 <2e-16 ***
## newspaper -0.001037 0.005871 -0.177 0.86
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.023 on 196 degrees of freedom
## Multiple R-squared: 0.8972, Adjusted R-squared: 0.8956
## F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16
tidy() works directly on model objects, not raw data, so we use the tidy() function on our regression model. As you can see, the data is now in data frame format:
For lm() objects, you can return a confidence interval on the regression coefficients:
B.1.1 Correlations
For a correlation:
##
## Pearson's product-moment correlation
##
## data: marketing$youtube and marketing$facebook
## t = 0.77239, df = 198, p-value = 0.4408
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08457548 0.19208899
## sample estimates:
## cor
## 0.05480866
B.1.2 Chi-squares
A chi-square test object:
data("properties")
properties_table <- table(properties$property_type, properties$buyer_type)
properties_chisq <- chisq.test(properties_table, correct = FALSE)
properties_chisq##
## Pearson's Chi-squared test
##
## data: properties_table
## X-squared = 82.504, df = 9, p-value = 5.134e-14
B.1.3 t-tests
For a t-test object (applies to all t-tests):
##
## Welch Two Sample t-test
##
## data: weight by group
## t = -20.791, df = 26.872, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## -24.53135 -20.12353
## sample estimates:
## mean in group F mean in group M
## 63.49867 85.82612
B.1.4 ANOVA objects
For a regular aov() object, you can optionally ask for the intercept term using intercept = TRUE:
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 1 351.4 351.4 12.295 0.000923 ***
## exercise 2 1776.3 888.1 31.076 1.04e-09 ***
## treatment:exercise 2 217.3 108.7 3.802 0.028522 *
## Residuals 54 1543.3 28.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Objects fitted by the Anova package also work, but repeated measures designs do not work. It’s best to stick to rstatix if you want a repeated measures ANOVA in dataframe format.
TukeyHSD() can also be used:
emmeans objects can also be tidied, e.g. for simple effects tests:
# Simple effects of exercise for every treatment
emmeans(stress_aov, ~ exercise, by = "treatment") %>%
pairs() %>%
tidy()