Ancova

Published

June 1, 2023

Introduction

ANOVA is a statistical method used to compare the means of three or more groups to determine if at least one group mean is significantly different from the others. Please see the anova.qmd for more information. ANCOVA is an extension to ANOVA.

ANCOVA (Analysis of Covariance) is a statistical method that compares the means of two or more groups while controlling for one or more continuous covariates. By adjusting for these covariates, ANCOVA helps to reduce potential confounding effects, allowing for a clearer assessment of the main treatment effects. It assumes linear relationships between covariates and the dependent variable, along with normality and homogeneity of variances.

We follow the example from link Analysis of Covariance

Data Summary

df_sas %>% glimpse()
Rows: 30
Columns: 3
$ drug <fct> A, A, A, A, A, A, A, A, A, A, D, D, D, D, D, D, D, D, D, D, F, F,…
$ pre  <dbl> 11, 8, 5, 14, 19, 6, 10, 6, 11, 3, 6, 6, 7, 8, 18, 8, 19, 8, 5, 1…
$ post <dbl> 6, 0, 2, 8, 11, 4, 13, 1, 8, 0, 0, 2, 3, 1, 18, 4, 14, 9, 1, 9, 1…
df_sas %>% summary()
 drug        pre             post      
 A:10   Min.   : 3.00   Min.   : 0.00  
 D:10   1st Qu.: 7.00   1st Qu.: 2.00  
 F:10   Median :10.50   Median : 7.00  
        Mean   :10.73   Mean   : 7.90  
        3rd Qu.:13.75   3rd Qu.:12.75  
        Max.   :21.00   Max.   :23.00  

The Model

model_ancova <- lm(post ~ drug + pre, data = df_sas)
model_glance <- model_ancova %>% glance()
model_tidy   <- model_ancova %>% tidy()
model_glance %>% gt()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.6762609 0.6389064 4.005778 18.10386 1.501369e-06 3 -82.05377 174.1075 181.1135 417.2026 26 30
model_tidy   %>% gt()
term estimate std.error statistic p.value
(Intercept) -3.8808094 1.9862017 -1.9538849 6.155192e-02
drugD 0.1089713 1.7951351 0.0607037 9.520594e-01
drugF 3.4461383 1.8867806 1.8264647 7.928458e-02
pre 0.9871838 0.1644976 6.0012061 2.454330e-06
model_table <- 
  model_ancova %>% 
  anova() %>% 
  tidy() %>% 
  add_row(term = "Total", df = sum(.$df), sumsq = sum(.$sumsq))
model_table %>% gt()
term df sumsq meansq statistic p.value
drug 2 293.6000 146.80000 9.148553 9.812371e-04
pre 1 577.8974 577.89740 36.014475 2.454330e-06
Residuals 26 417.2026 16.04625 NA NA
Total 29 1288.7000 NA NA NA

Type 1

df_sas %>%
  anova_test(post ~ drug + pre, type = 1, detailed = TRUE) %>% 
  get_anova_table() %>%
  gt()
Effect DFn DFd SSn SSd F p p<.05 ges
drug 2 26 293.600 417.203 9.149 9.81e-04 * 0.413
pre 1 26 577.897 417.203 36.014 2.45e-06 * 0.581

Type 2

df_sas %>% 
  anova_test(post ~ drug + pre, type = 2, detailed = TRUE) %>% 
  get_anova_table() %>% 
  gt()
Effect SSn SSd DFn DFd F p p<.05 ges
drug 68.554 417.203 2 26 2.136 1.38e-01 0.141
pre 577.897 417.203 1 26 36.014 2.45e-06 * 0.581

Type 3

df_sas %>%
  anova_test(post ~ drug + pre, type = 3, detailed = TRUE) %>% 
  get_anova_table() %>% 
  gt()
Effect SSn SSd DFn DFd F p p<.05 ges
(Intercept) 31.929 417.203 1 26 1.990 1.70e-01 0.071
drug 68.554 417.203 2 26 2.136 1.38e-01 0.141
pre 577.897 417.203 1 26 36.014 2.45e-06 * 0.581

Least Squares Means

model_ancova %>% emmeans::lsmeans("drug") %>% emmeans::pwpm(pvals = TRUE, means = TRUE) 
        A       D       F
A [ 6.71]  0.9980  0.1809
D  -0.109 [ 6.82]  0.1893
F  -3.446  -3.337 [10.16]

Row and column labels: drug
Upper triangle: P values   adjust = "tukey"
Diagonal: [Estimates] (lsmean) 
Lower triangle: Comparisons (estimate)   earlier vs. later
model_ancova %>% emmeans::lsmeans("drug") %>% plot(comparisons = TRUE)

sasLM Package

The following code performs an ANCOVA analysis using the sasLM package. This package was written specifically to replicate SAS statistics. The console output is also organized in a manner that is similar to SAS.

library(sasLM)

sasLM::GLM(post ~ drug + pre, df_sas, BETA = TRUE, EMEAN = TRUE)
$ANOVA
Response : post
                Df Sum Sq Mean Sq F value    Pr(>F)    
MODEL            3  871.5 290.499  18.104 1.501e-06 ***
RESIDUALS       26  417.2  16.046                      
CORRECTED TOTAL 29 1288.7                              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Fitness
 Root MSE post Mean Coef Var  R-square  Adj R-sq
 4.005778       7.9 50.70604 0.6762609 0.6389064

$`Type I`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  293.6   146.8  9.1486 0.0009812 ***
pre   1  577.9   577.9 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type II`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type III`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Parameter
            Estimate Estimable Std. Error Df t value  Pr(>|t|)    
(Intercept)  -0.4347         0     2.4714 26 -0.1759   0.86175    
drugA        -3.4461         0     1.8868 26 -1.8265   0.07928 .  
drugD        -3.3372         0     1.8539 26 -1.8001   0.08346 .  
drugF         0.0000         0     0.0000 26                      
pre           0.9872         1     0.1645 26  6.0012 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Expected Mean`
               LSmean  LowerCL   UpperCL        SE Df
(Intercept)  7.900000 6.396685  9.403315 0.7313516 26
drugA        6.714963 4.066426  9.363501 1.2884943 26
drugD        6.823935 4.208337  9.439532 1.2724690 26
drugF       10.161102 7.456182 12.866021 1.3159234 26
pre          7.900000 6.396685  9.403315 0.7313516 26

Note that the LSMEANS statistics are produced using the EMEAN = TRUE option. The BETA = TRUE option is equivalent to the SOLUTION option in SAS. See the sasLM documentation for additional information.