<- lm(y ~ drug + disease + drug*disease, df_disease) lm_model
ANOVA
Getting Started
To demonstrate the various types of sums of squares, we’ll create a data frame called df_disease
taken from the SAS documentation. The corresponding data can be found here.
The Model
For this example, we’re testing for a significant difference in stem_length
using ANOVA. In R, we’re using lm()
to run the ANOVA, and then using broom::glance()
and broom::tidy()
to view the results in a table format.
The glance
function gives us a summary of the model diagnostic values.
%>%
lm_model glance() %>%
pivot_longer(everything())
# A tibble: 12 × 2
name value
<chr> <dbl>
1 r.squared 0.456
2 adj.r.squared 0.326
3 sigma 10.5
4 statistic 3.51
5 p.value 0.00130
6 df 11
7 logLik -212.
8 AIC 450.
9 BIC 477.
10 deviance 5081.
11 df.residual 46
12 nobs 58
The tidy
function gives a summary of the model results.
%>% tidy() lm_model
# A tibble: 12 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 29.3 4.29 6.84 0.0000000160
2 drug2 -1.33 6.36 -0.210 0.835
3 drug3 -13 7.43 -1.75 0.0869
4 drug4 -15.7 6.36 -2.47 0.0172
5 disease2 -1.08 6.78 -0.160 0.874
6 disease3 -8.93 6.36 -1.40 0.167
7 drug2:disease2 6.58 9.78 0.673 0.504
8 drug3:disease2 -10.9 10.2 -1.06 0.295
9 drug4:disease2 0.317 9.30 0.0340 0.973
10 drug2:disease3 -0.900 9.00 -0.100 0.921
11 drug3:disease3 1.10 10.2 0.107 0.915
12 drug4:disease3 9.53 9.20 1.04 0.306
The Results
You’ll see that R print the individual results for each level of the drug and disease interaction. We can get the combined F table in R using the anova()
function on the model object.
%>%
lm_model anova() %>%
tidy() %>%
kable()
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
drug | 3 | 3133.2385 | 1044.4128 | 9.455761 | 0.0000558 |
disease | 2 | 418.8337 | 209.4169 | 1.895990 | 0.1617201 |
drug:disease | 6 | 707.2663 | 117.8777 | 1.067225 | 0.3958458 |
Residuals | 46 | 5080.8167 | 110.4525 | NA | NA |
We can add a Total
row, by using add_row
and calculating the sum of the degrees of freedom and sum of squares.
%>%
lm_model anova() %>%
tidy() %>%
add_row(term = "Total", df = sum(.$df), sumsq = sum(.$sumsq)) %>%
kable()
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
drug | 3 | 3133.2385 | 1044.4128 | 9.455761 | 0.0000558 |
disease | 2 | 418.8337 | 209.4169 | 1.895990 | 0.1617201 |
drug:disease | 6 | 707.2663 | 117.8777 | 1.067225 | 0.3958458 |
Residuals | 46 | 5080.8167 | 110.4525 | NA | NA |
Total | 57 | 9340.1552 | NA | NA | NA |
Sums of Squares Tables
Unfortunately, it is not easy to get the various types of sums of squares calculations in using functions from base R. However, the rstatix
package offers a solution to produce these various sums of squares tables. For each type, you supply the original dataset and model to the. anova_test
function, then specify the ttype and se detailed = TRUE
.
Type I
%>%
df_disease ::anova_test(
rstatix~ drug + disease + drug*disease,
y type = 1,
detailed = TRUE) %>%
::get_anova_table() %>%
rstatixkable()
Effect | DFn | DFd | SSn | SSd | F | p | p<.05 | ges |
---|---|---|---|---|---|---|---|---|
drug | 3 | 46 | 3133.239 | 5080.817 | 9.456 | 5.58e-05 | * | 0.381 |
disease | 2 | 46 | 418.834 | 5080.817 | 1.896 | 1.62e-01 | 0.076 | |
drug:disease | 6 | 46 | 707.266 | 5080.817 | 1.067 | 3.96e-01 | 0.122 |
Type II
%>%
df_disease ::anova_test(
rstatix~ drug + disease + drug*disease,
y type = 2,
detailed = TRUE) %>%
::get_anova_table() %>%
rstatixkable()
Effect | SSn | SSd | DFn | DFd | F | p | p<.05 | ges |
---|---|---|---|---|---|---|---|---|
drug | 3063.433 | 5080.817 | 3 | 46 | 9.245 | 6.75e-05 | * | 0.376 |
disease | 418.834 | 5080.817 | 2 | 46 | 1.896 | 1.62e-01 | 0.076 | |
drug:disease | 707.266 | 5080.817 | 6 | 46 | 1.067 | 3.96e-01 | 0.122 |
Type III
%>%
df_disease ::anova_test(
rstatix~ drug + disease + drug*disease,
y type = 3,
detailed = TRUE) %>%
::get_anova_table() %>%
rstatixkable()
Effect | SSn | SSd | DFn | DFd | F | p | p<.05 | ges |
---|---|---|---|---|---|---|---|---|
(Intercept) | 20037.613 | 5080.817 | 1 | 46 | 181.414 | 0.00e+00 | * | 0.798 |
drug | 2997.472 | 5080.817 | 3 | 46 | 9.046 | 8.09e-05 | * | 0.371 |
disease | 415.873 | 5080.817 | 2 | 46 | 1.883 | 1.64e-01 | 0.076 | |
drug:disease | 707.266 | 5080.817 | 6 | 46 | 1.067 | 3.96e-01 | 0.122 |
Type IV
In R there is no equivalent operation to the Type IV
sums of squares calculation in SAS.