CMH Test

Cochran-Mantel-Haenszel Test

The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.

Available R packages

We did not find any R package that delivers all the same measures as SAS at once. Therefore, we tried out multiple packages:

Package General Association Row Means Differ Nonzero Correlation M-H Odds Ratio Homogeneity Test Note
stats::mantelhaen.test() Works well for 2x2xK
vcdExtra::CMHtest() Problems with sparsity, potential bug
epiDisplay::mhor() OR are limited to 2x2xK design

Data used

We will use the CDISC Pilot data set, which is publicly available on the PHUSE Test Data Factory repository. We applied very basic filtering conditions upfront (see below) and this data set served as the basis of the examples to follow.

# A tibble: 231 × 36
   STUDYID SITEID SITEGR1 USUBJID TRTSDT     TRTEDT     TRTP  TRTPN   AGE AGEGR1
   <chr>   <chr>  <chr>   <chr>   <date>     <date>     <chr> <dbl> <dbl> <chr> 
 1 CDISCP… 701    701     01-701… 2014-01-02 2014-07-02 Plac…     0    63 <65   
 2 CDISCP… 701    701     01-701… 2012-08-05 2012-09-01 Plac…     0    64 <65   
 3 CDISCP… 701    701     01-701… 2013-07-19 2014-01-14 Xano…    81    71 65-80 
 4 CDISCP… 701    701     01-701… 2014-03-18 2014-03-31 Xano…    54    74 65-80 
 5 CDISCP… 701    701     01-701… 2014-07-01 2014-12-30 Xano…    81    77 65-80 
 6 CDISCP… 701    701     01-701… 2013-02-12 2013-03-09 Plac…     0    85 >80   
 7 CDISCP… 701    701     01-701… 2014-01-01 2014-07-09 Xano…    54    68 65-80 
 8 CDISCP… 701    701     01-701… 2012-09-07 2012-09-16 Xano…    54    81 >80   
 9 CDISCP… 701    701     01-701… 2012-11-30 2013-01-23 Xano…    54    84 >80   
10 CDISCP… 701    701     01-701… 2014-03-12 2014-09-09 Plac…     0    52 <65   
# ℹ 221 more rows
# ℹ 26 more variables: AGEGR1N <dbl>, RACE <chr>, RACEN <dbl>, SEX <chr>,
#   ITTFL <chr>, EFFFL <chr>, COMP24FL <chr>, AVISIT <chr>, AVISITN <dbl>,
#   VISIT <chr>, VISITNUM <dbl>, ADY <dbl>, ADT <date>, PARAMCD <chr>,
#   PARAM <chr>, PARAMN <dbl>, AVAL <dbl>, ANL01FL <chr>, DTYPE <chr>,
#   AWRANGE <chr>, AWTARGET <dbl>, AWTDIFF <dbl>, AWLO <dbl>, AWHI <dbl>,
#   AWU <chr>, QSSEQ <dbl>

Example Code

Base R

This is included in a base installation of R, as part of the stats package. Requires inputting data as a table or as vectors.

mantelhaen.test(x = data$TRTP, y = data$SEX, z = data$AGEGR1)

    Cochran-Mantel-Haenszel test

data:  data$TRTP and data$SEX and data$AGEGR1
Cochran-Mantel-Haenszel M^2 = 2.482, df = 2, p-value = 0.2891

vcdExtra

The vcdExtra package provides results for the generalized CMH test, for each of the three model it outputs the Chi-square value and the respective p-values. Flexible data input methods available: table or formula (aggregated level data in a data frame).

library(vcdExtra)
Loading required package: vcd
Loading required package: grid
Loading required package: gnm

Attaching package: 'vcdExtra'
The following object is masked from 'package:dplyr':

    summarise
# Formula: Freq ~ X + Y | K
CMHtest(Freq ~ TRTP + SEX | AGEGR1 , data=data, overall=TRUE) 
$`AGEGR1:<65`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    in stratum AGEGR1:<65 

                 AltHypothesis   Chisq Df    Prob
cor        Nonzero correlation 0.33168  1 0.56467
rmeans  Row mean scores differ 1.52821  2 0.46575
cmeans  Col mean scores differ 0.33168  1 0.56467
general    General association 1.52821  2 0.46575


$`AGEGR1:>80`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    in stratum AGEGR1:>80 

                 AltHypothesis   Chisq Df    Prob
cor        Nonzero correlation 0.39433  1 0.53003
rmeans  Row mean scores differ 3.80104  2 0.14949
cmeans  Col mean scores differ 0.39433  1 0.53003
general    General association 3.80104  2 0.14949


$`AGEGR1:65-80`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    in stratum AGEGR1:65-80 

                 AltHypothesis   Chisq Df    Prob
cor        Nonzero correlation 0.52744  1 0.46768
rmeans  Row mean scores differ 0.62921  2 0.73008
cmeans  Col mean scores differ 0.52744  1 0.46768
general    General association 0.62921  2 0.73008


$ALL
Cochran-Mantel-Haenszel Statistics for TRTP by SEX 
    Overall tests, controlling for all strata 

                 AltHypothesis      Chisq Df    Prob
cor        Nonzero correlation 0.00086897  1 0.97648
rmeans  Row mean scores differ      2.482  2 0.28909
cmeans  Col mean scores differ 0.00086897  1 0.97648
general    General association      2.482  2 0.28909

epiDisplay

To get the M-H common odds ratio and the homogeneity test, the epiDisplay package can be used.

library(epiDisplay) 
mhor(x,y,k, graph = FALSE)

Forked Version - Solution for sparse data

To tackle the issue with sparse data it is recommended that a use of solve() is replaced with MASS::ginv. This was implemented in the forked version of vcdExtra which can be installed from here:

devtools::install_github("mstackhouse/vcdExtra")

However, also the forked version for the vcdExtra package works only until a certain level of sparsity. In case of our data, it still works if the data are stratified by the pooled Site ID (SITEGR1 - 11 unique values) whereas using the unpooled Site ID (SITEID - 17 unique values) also throws an error. Note: this version is not up to date and sometimes calculates degrees of freedom incorrectly.

References

Accessible Summary: https://online.stat.psu.edu/stat504/lesson/4/4.4

An Introduction to Categorical Data Analysis 2nd Edition (Agresti): http://users.stat.ufl.edu/~aa/

Original Paper 1: https://doi.org/10.2307%2F3001616

Original Paper 2: https://doi.org/10.1093/jnci/22.4.719