Ancova

Published

June 1, 2023

Introduction

In this example, we’re looking at Analysis of Covariance. ANCOVA is typically used to analyse treatment differences, to see examples of prediction models go to the simple linear regression page.

Data Summary

df_sas %>% glimpse()
Rows: 30
Columns: 3
$ drug <fct> A, A, A, A, A, A, A, A, A, A, D, D, D, D, D, D, D, D, D, D, F, F,…
$ pre  <dbl> 11, 8, 5, 14, 19, 6, 10, 6, 11, 3, 6, 6, 7, 8, 18, 8, 19, 8, 5, 1…
$ post <dbl> 6, 0, 2, 8, 11, 4, 13, 1, 8, 0, 0, 2, 3, 1, 18, 4, 14, 9, 1, 9, 1…
df_sas %>% summary()
 drug        pre             post      
 A:10   Min.   : 3.00   Min.   : 0.00  
 D:10   1st Qu.: 7.00   1st Qu.: 2.00  
 F:10   Median :10.50   Median : 7.00  
        Mean   :10.73   Mean   : 7.90  
        3rd Qu.:13.75   3rd Qu.:12.75  
        Max.   :21.00   Max.   :23.00  

The Model

model_ancova <- lm(post ~ drug + pre, data = df_sas)
model_glance <- model_ancova %>% glance()
model_tidy   <- model_ancova %>% tidy()
model_glance %>% gt()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.6762609 0.6389064 4.005778 18.10386 1.501369e-06 3 -82.05377 174.1075 181.1135 417.2026 26 30
model_tidy   %>% gt()
term estimate std.error statistic p.value
(Intercept) -3.8808094 1.9862017 -1.9538849 6.155192e-02
drugD 0.1089713 1.7951351 0.0607037 9.520594e-01
drugF 3.4461383 1.8867806 1.8264647 7.928458e-02
pre 0.9871838 0.1644976 6.0012061 2.454330e-06
model_table <- 
  model_ancova %>% 
  anova() %>% 
  tidy() %>% 
  add_row(term = "Total", df = sum(.$df), sumsq = sum(.$sumsq))
model_table %>% gt()
term df sumsq meansq statistic p.value
drug 2 293.6000 146.80000 9.148553 9.812371e-04
pre 1 577.8974 577.89740 36.014475 2.454330e-06
Residuals 26 417.2026 16.04625 NA NA
Total 29 1288.7000 NA NA NA

Type 1

df_sas %>%
  anova_test(post ~ drug + pre, type = 1, detailed = TRUE) %>% 
  get_anova_table() %>%
  gt()
Effect DFn DFd SSn SSd F p p<.05 ges
drug 2 26 293.600 417.203 9.149 9.81e-04 * 0.413
pre 1 26 577.897 417.203 36.014 2.45e-06 * 0.581

Type 2

df_sas %>% 
  anova_test(post ~ drug + pre, type = 2, detailed = TRUE) %>% 
  get_anova_table() %>% 
  gt()
Effect SSn SSd DFn DFd F p p<.05 ges
drug 68.554 417.203 2 26 2.136 1.38e-01 0.141
pre 577.897 417.203 1 26 36.014 2.45e-06 * 0.581

Type 3

df_sas %>%
  anova_test(post ~ drug + pre, type = 3, detailed = TRUE) %>% 
  get_anova_table() %>% 
  gt()
Effect SSn SSd DFn DFd F p p<.05 ges
(Intercept) 31.929 417.203 1 26 1.990 1.70e-01 0.071
drug 68.554 417.203 2 26 2.136 1.38e-01 0.141
pre 577.897 417.203 1 26 36.014 2.45e-06 * 0.581

LS Means

model_ancova %>% emmeans::lsmeans("drug") %>% emmeans::pwpm(pvals = TRUE, means = TRUE) 
        A       D       F
A [ 6.71]  0.9980  0.1809
D  -0.109 [ 6.82]  0.1893
F  -3.446  -3.337 [10.16]

Row and column labels: drug
Upper triangle: P values   adjust = "tukey"
Diagonal: [Estimates] (lsmean) 
Lower triangle: Comparisons (estimate)   earlier vs. later
model_ancova %>% emmeans::lsmeans("drug") %>% plot(comparisons = TRUE)

sasLM Package

The following code performs an ANCOVA analysis using the sasLM package. This package was written specifically to replicate SAS statistics. The console output is also organized in a manner that is similar to SAS.

library(sasLM)

sasLM::GLM(post ~ drug + pre, df_sas, BETA = TRUE, EMEAN = TRUE)
$ANOVA
Response : post
                Df Sum Sq Mean Sq F value    Pr(>F)    
MODEL            3  871.5 290.499  18.104 1.501e-06 ***
RESIDUALS       26  417.2  16.046                      
CORRECTED TOTAL 29 1288.7                              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Fitness
 Root MSE post Mean Coef Var  R-square  Adj R-sq
 4.005778       7.9 50.70604 0.6762609 0.6389064

$`Type I`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  293.6   146.8  9.1486 0.0009812 ***
pre   1  577.9   577.9 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type II`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type III`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Parameter
            Estimate Estimable Std. Error Df t value  Pr(>|t|)    
(Intercept)  -0.4347         0     2.4714 26 -0.1759   0.86175    
drugA        -3.4461         0     1.8868 26 -1.8265   0.07928 .  
drugD        -3.3372         0     1.8539 26 -1.8001   0.08346 .  
drugF         0.0000         0     0.0000 26                      
pre           0.9872         1     0.1645 26  6.0012 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Expected Mean`
               LSmean  LowerCL   UpperCL        SE Df
(Intercept)  7.900000 6.396685  9.403315 0.7313516 26
drugA        6.714963 4.066426  9.363501 1.2884943 26
drugD        6.823935 4.208337  9.439532 1.2724690 26
drugF       10.161102 7.456182 12.866021 1.3159234 26
pre          7.900000 6.396685  9.403315 0.7313516 26

Note that the LSMEANS statistics are produced using the EMEAN = TRUE option. The BETA = TRUE option is equivalent to the SOLUTION option in SAS. See the sasLM documentation for additional information.