Ancova

Published

June 1, 2023

Introduction

ANOVA is a statistical method used to compare the means of three or more groups to determine if at least one group mean is significantly different from the others. Please see the ANOVA document for more information. ANCOVA is an extension to ANOVA.

ANCOVA (Analysis of Covariance) is a statistical method that compares the means of two or more groups while controlling for one or more continuous covariates. By adjusting for these covariates, ANCOVA helps to reduce potential confounding effects, allowing for a clearer assessment of the main treatment effects. It assumes linear relationships between covariates and the dependent variable, along with normality and homogeneity of variances.

We follow the example from link Analysis of Covariance

Data Summary

df_sas %>% glimpse()

Rows: 30
Columns: 3
$ drug <fct> A, A, A, A, A, A, A, A, A, A, D, D, D, D, D, D, D, D, D, D, F, F,…
$ pre  <dbl> 11, 8, 5, 14, 19, 6, 10, 6, 11, 3, 6, 6, 7, 8, 18, 8, 19, 8, 5, 1…
$ post <dbl> 6, 0, 2, 8, 11, 4, 13, 1, 8, 0, 0, 2, 3, 1, 18, 4, 14, 9, 1, 9, 1…

df_sas %>% summary()

 drug        pre             post      
 A:10   Min.   : 3.00   Min.   : 0.00  
 D:10   1st Qu.: 7.00   1st Qu.: 2.00  
 F:10   Median :10.50   Median : 7.00  
        Mean   :10.73   Mean   : 7.90  
        3rd Qu.:13.75   3rd Qu.:12.75  
        Max.   :21.00   Max.   :23.00

The Model

model_ancova <- lm(post ~ drug + pre, data = df_sas)
model_glance <- model_ancova %>% glance()
model_tidy   <- model_ancova %>% tidy()
model_glance %>% gt()

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.6762609	0.6389064	4.005778	18.10386	1.501369e-06	3	-82.05377	174.1075	181.1135	417.2026	26	30

model_tidy   %>% gt()

term	estimate	std.error	statistic	p.value
(Intercept)	-3.8808094	1.9862017	-1.9538849	6.155192e-02
drugD	0.1089713	1.7951351	0.0607037	9.520594e-01
drugF	3.4461383	1.8867806	1.8264647	7.928458e-02
pre	0.9871838	0.1644976	6.0012061	2.454330e-06

model_table <- 
  model_ancova %>% 
  anova() %>% 
  tidy() %>% 
  add_row(term = "Total", df = sum(.$df), sumsq = sum(.$sumsq))
model_table %>% gt()

term	df	sumsq	meansq	statistic	p.value
drug	2	293.6000	146.80000	9.148553	9.812371e-04
pre	1	577.8974	577.89740	36.014475	2.454330e-06
Residuals	26	417.2026	16.04625	NA	NA
Total	29	1288.7000	NA	NA	NA

Type 1

df_sas %>%
  anova_test(post ~ drug + pre, type = 1, detailed = TRUE) %>% 
  get_anova_table() %>%
  gt()

Effect	DFn	DFd	SSn	SSd	F	p	p<.05	ges
drug	2	26	293.600	417.203	9.149	9.81e-04	*	0.413
pre	1	26	577.897	417.203	36.014	2.45e-06	*	0.581

Type 2

df_sas %>% 
  anova_test(post ~ drug + pre, type = 2, detailed = TRUE) %>% 
  get_anova_table() %>% 
  gt()

Effect	SSn	SSd	DFn	DFd	F	p	p<.05	ges
drug	68.554	417.203	2	26	2.136	1.38e-01		0.141
pre	577.897	417.203	1	26	36.014	2.45e-06	*	0.581

Type 3

df_sas %>%
  anova_test(post ~ drug + pre, type = 3, detailed = TRUE) %>% 
  get_anova_table() %>% 
  gt()

Effect	SSn	SSd	DFn	DFd	F	p	p<.05	ges
(Intercept)	31.929	417.203	1	26	1.990	1.70e-01		0.071
drug	68.554	417.203	2	26	2.136	1.38e-01		0.141
pre	577.897	417.203	1	26	36.014	2.45e-06	*	0.581

Least Squares Means

model_ancova %>% emmeans::lsmeans("drug") %>% emmeans::pwpm(pvals = TRUE, means = TRUE)

        A       D       F
A [ 6.71]  0.9980  0.1809
D  -0.109 [ 6.82]  0.1893
F  -3.446  -3.337 [10.16]

Row and column labels: drug
Upper triangle: P values   adjust = "tukey"
Diagonal: [Estimates] (lsmean) 
Lower triangle: Comparisons (estimate)   earlier vs. later

model_ancova %>% emmeans::lsmeans("drug") %>% plot(comparisons = TRUE)

sasLM Package

The following code performs an ANCOVA analysis using the sasLM package. This package was written specifically to replicate SAS statistics. The console output is also organized in a manner that is similar to SAS.

library(sasLM)

sasLM::GLM(post ~ drug + pre, df_sas, BETA = TRUE, EMEAN = TRUE)

$ANOVA
Response : post
                Df Sum Sq Mean Sq F value    Pr(>F)    
MODEL            3  871.5 290.499  18.104 1.501e-06 ***
RESIDUALS       26  417.2  16.046                      
CORRECTED TOTAL 29 1288.7                              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Fitness
 Root MSE post Mean Coef Var  R-square  Adj R-sq
 4.005778       7.9 50.70604 0.6762609 0.6389064

$`Type I`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  293.6   146.8  9.1486 0.0009812 ***
pre   1  577.9   577.9 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type II`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type III`
     Df Sum Sq Mean Sq F value    Pr(>F)    
drug  2  68.55   34.28  2.1361    0.1384    
pre   1 577.90  577.90 36.0145 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Parameter
            Estimate Estimable Std. Error Df t value  Pr(>|t|)    
(Intercept)  -0.4347         0     2.4714 26 -0.1759   0.86175    
drugA        -3.4461         0     1.8868 26 -1.8265   0.07928 .  
drugD        -3.3372         0     1.8539 26 -1.8001   0.08346 .  
drugF         0.0000         0     0.0000 26                      
pre           0.9872         1     0.1645 26  6.0012 2.454e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Expected Mean`
               LSmean  LowerCL   UpperCL        SE Df
(Intercept)  7.900000 6.396685  9.403315 0.7313516 26
drugA        6.714963 4.066426  9.363501 1.2884943 26
drugD        6.823935 4.208337  9.439532 1.2724690 26
drugF       10.161102 7.456182 12.866021 1.3159234 26
pre          7.900000 6.396685  9.403315 0.7313516 26

Note that the LSMEANS statistics are produced using the EMEAN = TRUE option. The BETA = TRUE option is equivalent to the SOLUTION option in SAS. See the sasLM documentation for additional information.