CMH Test

Cochran-Mantel-Haenszel Test

The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.

Available R packages

We did not find any R package that delivers all the same measures as SAS at once. Therefore, we tried out multiple packages:

Package General Association Row Means Differ Nonzero Correlation M-H Odds Ratio Homogeneity Test Note
stats::mantelhaen.test() Works well for 2x2xK
vcdExtra::CMHtest() Problems with sparsity, potential bug
epiDisplay::mhor() OR are limited to 2x2xK design

Data used

We will use the CDISC Pilot data set, which is publicly available on the PHUSE Test Data Factory repository. This data set served as the basis of the examples to follow.

  X      STUDYID SITEID SITEGR1     USUBJID     TRTSDT     TRTEDT
1 1 CDISCPILOT01    701     701 01-701-1015 2014-01-02 2014-07-02
2 2 CDISCPILOT01    701     701 01-701-1023 2012-08-05 2012-09-01
3 3 CDISCPILOT01    701     701 01-701-1028 2013-07-19 2014-01-14
4 4 CDISCPILOT01    701     701 01-701-1033 2014-03-18 2014-03-31
5 5 CDISCPILOT01    701     701 01-701-1034 2014-07-01 2014-12-30
6 6 CDISCPILOT01    701     701 01-701-1047 2013-02-12 2013-03-09
                  TRTP TRTPN AGE AGEGR1 AGEGR1N  RACE RACEN SEX ITTFL EFFFL
1              Placebo     0  63    <65       1 WHITE     1   F     Y     Y
2              Placebo     0  64    <65       1 WHITE     1   M     Y     Y
3 Xanomeline High Dose    81  71  65-80       2 WHITE     1   M     Y     Y
4  Xanomeline Low Dose    54  74  65-80       2 WHITE     1   M     Y     Y
5 Xanomeline High Dose    81  77  65-80       2 WHITE     1   F     Y     Y
6              Placebo     0  85    >80       3 WHITE     1   F     Y     Y
  COMP24FL AVISIT AVISITN             VISIT VISITNUM ADY        ADT  PARAMCD
1        Y Week 8       8            WEEK 8        8  63 2014-03-05 CIBICVAL
2        N Week 8       8            WEEK 4        5  29 2012-09-02 CIBICVAL
3        Y Week 8       8            WEEK 8        8  54 2013-09-10 CIBICVAL
4        N Week 8       8            WEEK 4        5  28 2014-04-14 CIBICVAL
5        Y Week 8       8            WEEK 8        8  57 2014-08-26 CIBICVAL
6        N Week 8       8 AMBUL ECG REMOVAL        6  46 2013-03-29 CIBICVAL
        PARAM PARAMN AVAL ANL01FL DTYPE AWRANGE AWTARGET AWTDIFF AWLO AWHI  AWU
1 CIBIC Score      1    4       Y    NA    2-84       56       7    2   84 DAYS
2 CIBIC Score      1    3       Y    NA    2-84       56      27    2   84 DAYS
3 CIBIC Score      1    4       Y    NA    2-84       56       2    2   84 DAYS
4 CIBIC Score      1    4       Y    NA    2-84       56      28    2   84 DAYS
5 CIBIC Score      1    4       Y    NA    2-84       56       1    2   84 DAYS
6 CIBIC Score      1    4       Y    NA    2-84       56      10    2   84 DAYS
  QSSEQ
1  6001
2  6001
3  6001
4  6001
5  6001
6  6001

Example Code

Base R

This is included in a base installation of R, as part of the stats package. Requires inputting data as a table or as vectors. Two examples 2x2x2 examples: X=2 treatments, Y=Sex (M/F), controlling for 2 age groups and 3x2x3 example X=3 treatments, Y=Sex (M/F), controlling for 3 age groups.

#2x2x2 example
data2 <- data %>% 
         filter(TRTPN !="54" & AGEGR1 !=">80")
mantelhaen.test(x = data2$TRTP, y = data2$SEX, z = data2$AGEGR1)

    Mantel-Haenszel chi-squared test with continuity correction

data:  data2$TRTP and data2$SEX and data2$AGEGR1
Mantel-Haenszel X-squared = 0.076264, df = 1, p-value = 0.7824
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.5671347 2.5129874
sample estimates:
common odds ratio 
         1.193818 
#3x2x3 example
mantelhaen.test(x = data$TRTP, y = data$SEX, z = data$AGEGR1)

    Cochran-Mantel-Haenszel test

data:  data$TRTP and data$SEX and data$AGEGR1
Cochran-Mantel-Haenszel M^2 = 2.482, df = 2, p-value = 0.2891

vcdExtra

The vcdExtra package provides results for the generalized CMH test, for each of the three model it outputs the Chi-square value and the respective p-values. Flexible data input methods available: table or formula (aggregated level data in a data frame).

library(vcdExtra)
Loading required package: vcd
Loading required package: grid
Loading required package: gnm

Attaching package: 'vcdExtra'
The following object is masked from 'package:dplyr':

    summarise
# Formula: Freq ~ X + Y | K
CMHtest(Freq ~ TRTP + SEX | AGEGR1 , data=data, overall=TRUE) 
$`AGEGR1:<65`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    in stratum AGEGR1:<65 

                 AltHypothesis   Chisq Df    Prob
cor        Nonzero correlation 0.33168  1 0.56467
rmeans  Row mean scores differ 1.52821  2 0.46575
cmeans  Col mean scores differ 0.33168  1 0.56467
general    General association 1.52821  2 0.46575


$`AGEGR1:>80`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    in stratum AGEGR1:>80 

                 AltHypothesis   Chisq Df    Prob
cor        Nonzero correlation 0.39433  1 0.53003
rmeans  Row mean scores differ 3.80104  2 0.14949
cmeans  Col mean scores differ 0.39433  1 0.53003
general    General association 3.80104  2 0.14949


$`AGEGR1:65-80`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    in stratum AGEGR1:65-80 

                 AltHypothesis   Chisq Df    Prob
cor        Nonzero correlation 0.52744  1 0.46768
rmeans  Row mean scores differ 0.62921  2 0.73008
cmeans  Col mean scores differ 0.52744  1 0.46768
general    General association 0.62921  2 0.73008


$ALL
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    Overall tests, controlling for all strata 

                 AltHypothesis      Chisq Df    Prob
cor        Nonzero correlation 0.00086897  1 0.97648
rmeans  Row mean scores differ      2.482  2 0.28909
cmeans  Col mean scores differ 0.00086897  1 0.97648
general    General association      2.482  2 0.28909

epiDisplay

To get the M-H common odds ratio and the homogeneity test, the epiDisplay package can be used.

library(epiDisplay) 
mhor(x,y,k, graph = FALSE)

Forked Version - Solution for sparse data

To tackle the issue with sparse data it is recommended that a use of solve() is replaced with MASS::ginv. This was implemented in the forked version of vcdExtra which can be installed from here:

devtools::install_github("mstackhouse/vcdExtra")

However, also the forked version for the vcdExtra package works only until a certain level of sparsity. In case of our data, it still works if the data are stratified by the pooled Site ID (SITEGR1 - 11 unique values) whereas using the unpooled Site ID (SITEID - 17 unique values) also throws an error. Note: this version is not up to date and sometimes calculates degrees of freedom incorrectly.

References

Accessible Summary: https://online.stat.psu.edu/stat504/lesson/4/4.4

An Introduction to Categorical Data Analysis 2nd Edition (Agresti): http://users.stat.ufl.edu/~aa/

Original Paper 1: https://doi.org/10.2307%2F3001616

Original Paper 2: https://doi.org/10.1093/jnci/22.4.719