Ancova

Published

June 1, 2023

Introduction

ANOVA is a statistical method used to compare the means of three or more groups to determine if at least one group mean is significantly different from the others. Please see the ANOVA document for more information. ANCOVA is an extension to ANOVA.

ANCOVA (Analysis of Covariance) is a statistical method that compares the means of two or more groups while controlling for one or more continuous covariates. By adjusting for these covariates, ANCOVA helps to reduce potential confounding effects, allowing for a clearer assessment of the main treatment effects. It assumes linear relationships between covariates and the dependent variable, along with normality and homogeneity of variances.

We follow the example from link Analysis of Covariance

Data Summary

df_sas |> glimpse()
Rows: 30
Columns: 3
$ drug <fct> A, A, A, A, A, A, A, A, A, A, D, D, D, D, D, D, D, D, D, D, F, F,…
$ pre  <dbl> 11, 8, 5, 14, 19, 6, 10, 6, 11, 3, 6, 6, 7, 8, 18, 8, 19, 8, 5, 1…
$ post <dbl> 6, 0, 2, 8, 11, 4, 13, 1, 8, 0, 0, 2, 3, 1, 18, 4, 14, 9, 1, 9, 1…
df_sas |> summary()
 drug        pre             post      
 A:10   Min.   : 3.00   Min.   : 0.00  
 D:10   1st Qu.: 7.00   1st Qu.: 2.00  
 F:10   Median :10.50   Median : 7.00  
        Mean   :10.73   Mean   : 7.90  
        3rd Qu.:13.75   3rd Qu.:12.75  
        Max.   :21.00   Max.   :23.00  

The Model

model_ancova <- lm(post ~ drug + pre, data = df_sas)

model_glance <- model_ancova |>
  glance()
model_tidy <- model_ancova |>
  tidy()
model_glance |>
  gt()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.6762609 0.6389064 4.005778 18.10386 1.501369e-06 3 -82.05377 174.1075 181.1135 417.2026 26 30
model_tidy |>
  gt()
term estimate std.error statistic p.value
(Intercept) -3.8808094 1.9862017 -1.9538849 6.155192e-02
drugD 0.1089713 1.7951351 0.0607037 9.520594e-01
drugF 3.4461383 1.8867806 1.8264647 7.928458e-02
pre 0.9871838 0.1644976 6.0012061 2.454330e-06
model_table <- model_ancova |>
  anova() |>
  tidy()

total_df <- sum(model_table$df)
total_sumsq <- sum(model_table$sumsq)

model_table |>
  add_row(term = "Total", df = total_df, sumsq = total_sumsq) |>
  gt()
term df sumsq meansq statistic p.value
drug 2 293.6000 146.80000 9.148553 9.812371e-04
pre 1 577.8974 577.89740 36.014475 2.454330e-06
Residuals 26 417.2026 16.04625 NA NA
Total 29 1288.7000 NA NA NA

Sums of Squares Tables

Type I

This can be calculated using, the base R {stats} package or the {rstatix} package. Both give the same result.

stats
stats::anova(model_ancova)
Analysis of Variance Table

Response: post
          Df Sum Sq Mean Sq F value    Pr(>F)    
drug       2  293.6  146.80  9.1486 0.0009812 ***
pre        1  577.9  577.90 36.0145 2.454e-06 ***
Residuals 26  417.2   16.05                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rstatix
df_sas |>
  anova_test(post ~ drug + pre, type = 1, detailed = TRUE) |>
  get_anova_table() |>
  gt()
Effect DFn DFd SSn SSd F p p<.05 ges
drug 2 26 293.600 417.203 9.149 9.81e-04 * 0.413
pre 1 26 577.897 417.203 36.014 2.45e-06 * 0.581

Type II

This can be calculated using the {car} package or the {rstatix} package. Both give the same result.

car
car::Anova(model_ancova, type = "II")
Anova Table (Type II tests)

Response: post
          Sum Sq Df F value    Pr(>F)    
drug       68.55  2  2.1361    0.1384    
pre       577.90  1 36.0145 2.454e-06 ***
Residuals 417.20 26                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rstatix
df_sas |>
  anova_test(post ~ drug + pre, type = 2, detailed = TRUE) |>
  get_anova_table() |>
  gt()
Effect SSn SSd DFn DFd F p p<.05 ges
drug 68.554 417.203 2 26 2.136 1.38e-01 0.141
pre 577.897 417.203 1 26 36.014 2.45e-06 * 0.581

Type III

This can be calculated using the base R {stats} package, the {car} package or the {rstatix} package. All give the same result.

Note: Calculating type III sums of squares in R is a bit tricky, because the multi-way ANOVA model is over-paramerterised. So when running the linear model we need to select a design matrix that sums to zero. In R those options will be either "contr.sum" or "contr.poly"

# Drug design matrix
contr.sum(4) # Using 4 here as we have 4 levels of drug
  [,1] [,2] [,3]
1    1    0    0
2    0    1    0
3    0    0    1
4   -1   -1   -1
# Disease design matrix
contr.sum(3)
  [,1] [,2]
1    1    0
2    0    1
3   -1   -1

While not relevant for this example as the disease variable isn’t ordinal the polynomial design matrix would look like

contr.poly(3)
                .L         .Q
[1,] -7.071068e-01  0.4082483
[2,] -7.850462e-17 -0.8164966
[3,]  7.071068e-01  0.4082483
model_ancova <- lm(
  post ~ drug + pre, data = df_sas,
  contrasts = list(drug = "contr.sum")
)
stats

Using the base stats package, you can use the drop1() function which drops all possible single terms in a model. The scope term specifies how things can be dropped.

stats::drop1(model_ancova, scope = . ~ ., test = "F")
Single term deletions

Model:
post ~ drug + pre
       Df Sum of Sq    RSS     AIC F value    Pr(>F)    
<none>              417.20  86.971                      
drug    2     68.55 485.76  87.535  2.1361    0.1384    
pre     1    577.90 995.10 111.049 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car
car::Anova(model_ancova, type = "III")
Anova Table (Type III tests)

Response: post
            Sum Sq Df F value    Pr(>F)    
(Intercept)  31.93  1  1.9898    0.1702    
drug         68.55  2  2.1361    0.1384    
pre         577.90  1 36.0145 2.454e-06 ***
Residuals   417.20 26                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
df_sas |>
  anova_test(post ~ drug + pre, type = 3, detailed = TRUE) |>
  get_anova_table() |>
  gt()
Effect SSn SSd DFn DFd F p p<.05 ges
(Intercept) 31.929 417.203 1 26 1.990 1.70e-01 0.071
drug 68.554 417.203 2 26 2.136 1.38e-01 0.141
pre 577.897 417.203 1 26 36.014 2.45e-06 * 0.581

Least Squares Means

model_ancova |>
  emmeans::lsmeans("drug") |>
  emmeans::pwpm(pvals = TRUE, means = TRUE)
        A       D       F
A [ 6.71]  0.9980  0.1809
D  -0.109 [ 6.82]  0.1893
F  -3.446  -3.337 [10.16]

Row and column labels: drug
Upper triangle: P values   adjust = "tukey"
Diagonal: [Estimates] (lsmean) 
Lower triangle: Comparisons (estimate)   earlier vs. later
model_ancova |>
  emmeans::lsmeans("drug") |>
  plot(comparisons = TRUE)

sasLM Package

The following code performs an ANCOVA analysis using the sasLM package. This package was written specifically to replicate SAS statistics. The console output is also organized in a manner that is similar to SAS.

library(sasLM)

sasLM::GLM(post ~ drug + pre, df_sas, BETA = TRUE, EMEAN = TRUE)
$ANOVA
Response : post
                Df Sum Sq Mean Sq F value    Pr(>F)    
MODEL            3  871.5 290.499  18.104 1.501e-06 ***
RESIDUALS       26  417.2  16.046                      
CORRECTED TOTAL 29 1288.7                              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Fitness
 Root MSE post Mean Coef Var  R-square  Adj R-sq
 4.005778       7.9 50.70604 0.6762609 0.6389064

$`Type I`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  293.6   146.8  9.1486 0.0009812 ***
pre   1  577.9   577.9 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type II`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type III`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Parameter
            Estimate Estimable Std. Error Df t value  Pr(>|t|)    
(Intercept)  -0.4347         0     2.4714 26 -0.1759   0.86175    
drugA        -3.4461         0     1.8868 26 -1.8265   0.07928 .  
drugD        -3.3372         0     1.8539 26 -1.8001   0.08346 .  
drugF         0.0000         0     0.0000 26                      
pre           0.9872         1     0.1645 26  6.0012 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Expected Mean`
               LSmean  LowerCL   UpperCL        SE Df
(Intercept)  7.900000 6.396685  9.403315 0.7313516 26
drugA        6.714963 4.066426  9.363501 1.2884943 26
drugD        6.823935 4.208337  9.439532 1.2724690 26
drugF       10.161102 7.456182 12.866021 1.3159234 26
pre          7.900000 6.396685  9.403315 0.7313516 26

Note that the LSMEANS statistics are produced using the EMEAN = TRUE option. The BETA = TRUE option is equivalent to the SOLUTION option in SAS. See the sasLM documentation for additional information.