Package | General Association | Row Means Differ | Nonzero Correlation | M-H Odds Ratio | Homogeneity Test | Note |
---|---|---|---|---|---|---|
stats::mantelhaen.test() | ✅ | ❌ | ❌ | ✅ | ❌ | Works well for 2x2xK |
vcdExtra::CMHtest() | ✅ | ✅ | ✅ | ❌ | ❌ | Problems with sparsity, potential bug |
epiDisplay::mhor() | ❌ | ❌ | ❌ | ✅ | ✅ | OR are limited to 2x2xK design |
CMH Test
Cochran-Mantel-Haenszel Test
The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.
Available R packages
We did not find any R package that delivers all the same measures as SAS at once. Therefore, we tried out multiple packages:
Data used
We will use the CDISC Pilot data set, which is publicly available on the PHUSE Test Data Factory repository. We applied very basic filtering conditions upfront (see below) and this data set served as the basis of the examples to follow.
# A tibble: 231 × 36
STUDYID SITEID SITEGR1 USUBJID TRTSDT TRTEDT TRTP TRTPN AGE AGEGR1
<chr> <chr> <chr> <chr> <date> <date> <chr> <dbl> <dbl> <chr>
1 CDISCP… 701 701 01-701… 2014-01-02 2014-07-02 Plac… 0 63 <65
2 CDISCP… 701 701 01-701… 2012-08-05 2012-09-01 Plac… 0 64 <65
3 CDISCP… 701 701 01-701… 2013-07-19 2014-01-14 Xano… 81 71 65-80
4 CDISCP… 701 701 01-701… 2014-03-18 2014-03-31 Xano… 54 74 65-80
5 CDISCP… 701 701 01-701… 2014-07-01 2014-12-30 Xano… 81 77 65-80
6 CDISCP… 701 701 01-701… 2013-02-12 2013-03-09 Plac… 0 85 >80
7 CDISCP… 701 701 01-701… 2014-01-01 2014-07-09 Xano… 54 68 65-80
8 CDISCP… 701 701 01-701… 2012-09-07 2012-09-16 Xano… 54 81 >80
9 CDISCP… 701 701 01-701… 2012-11-30 2013-01-23 Xano… 54 84 >80
10 CDISCP… 701 701 01-701… 2014-03-12 2014-09-09 Plac… 0 52 <65
# ℹ 221 more rows
# ℹ 26 more variables: AGEGR1N <dbl>, RACE <chr>, RACEN <dbl>, SEX <chr>,
# ITTFL <chr>, EFFFL <chr>, COMP24FL <chr>, AVISIT <chr>, AVISITN <dbl>,
# VISIT <chr>, VISITNUM <dbl>, ADY <dbl>, ADT <date>, PARAMCD <chr>,
# PARAM <chr>, PARAMN <dbl>, AVAL <dbl>, ANL01FL <chr>, DTYPE <chr>,
# AWRANGE <chr>, AWTARGET <dbl>, AWTDIFF <dbl>, AWLO <dbl>, AWHI <dbl>,
# AWU <chr>, QSSEQ <dbl>
Example Code
Base R
This is included in a base installation of R, as part of the stats package. Requires inputting data as a table or as vectors.
mantelhaen.test(x = data$TRTP, y = data$SEX, z = data$AGEGR1)
Cochran-Mantel-Haenszel test
data: data$TRTP and data$SEX and data$AGEGR1
Cochran-Mantel-Haenszel M^2 = 2.482, df = 2, p-value = 0.2891
vcdExtra
The vcdExtra package provides results for the generalized CMH test, for each of the three model it outputs the Chi-square value and the respective p-values. Flexible data input methods available: table or formula (aggregated level data in a data frame).
library(vcdExtra)
Loading required package: vcd
Loading required package: grid
Loading required package: gnm
Attaching package: 'vcdExtra'
The following object is masked from 'package:dplyr':
summarise
# Formula: Freq ~ X + Y | K
CMHtest(Freq ~ TRTP + SEX | AGEGR1 , data=data, overall=TRUE)
$`AGEGR1:<65`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX
in stratum AGEGR1:<65
AltHypothesis Chisq Df Prob
cor Nonzero correlation 0.33168 1 0.56467
rmeans Row mean scores differ 1.52821 2 0.46575
cmeans Col mean scores differ 0.33168 1 0.56467
general General association 1.52821 2 0.46575
$`AGEGR1:>80`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX
in stratum AGEGR1:>80
AltHypothesis Chisq Df Prob
cor Nonzero correlation 0.39433 1 0.53003
rmeans Row mean scores differ 3.80104 2 0.14949
cmeans Col mean scores differ 0.39433 1 0.53003
general General association 3.80104 2 0.14949
$`AGEGR1:65-80`
Cochran-Mantel-Haenszel Statistics for TRTP by SEX
in stratum AGEGR1:65-80
AltHypothesis Chisq Df Prob
cor Nonzero correlation 0.52744 1 0.46768
rmeans Row mean scores differ 0.62921 2 0.73008
cmeans Col mean scores differ 0.52744 1 0.46768
general General association 0.62921 2 0.73008
$ALL
Cochran-Mantel-Haenszel Statistics for TRTP by SEX
Overall tests, controlling for all strata
AltHypothesis Chisq Df Prob
cor Nonzero correlation 0.00086897 1 0.97648
rmeans Row mean scores differ 2.482 2 0.28909
cmeans Col mean scores differ 0.00086897 1 0.97648
general General association 2.482 2 0.28909
epiDisplay
To get the M-H common odds ratio and the homogeneity test, the epiDisplay package can be used.
library(epiDisplay)
mhor(x,y,k, graph = FALSE)
Forked Version - Solution for sparse data
To tackle the issue with sparse data it is recommended that a use of solve()
is replaced with MASS::ginv
. This was implemented in the forked version of vcdExtra which can be installed from here:
::install_github("mstackhouse/vcdExtra") devtools
However, also the forked version for the vcdExtra package works only until a certain level of sparsity. In case of our data, it still works if the data are stratified by the pooled Site ID (SITEGR1 - 11 unique values) whereas using the unpooled Site ID (SITEID - 17 unique values) also throws an error. Note: this version is not up to date and sometimes calculates degrees of freedom incorrectly.
References
Accessible Summary: https://online.stat.psu.edu/stat504/lesson/4/4.4
An Introduction to Categorical Data Analysis 2nd Edition (Agresti): http://users.stat.ufl.edu/~aa/
Original Paper 1: https://doi.org/10.2307%2F3001616
Original Paper 2: https://doi.org/10.1093/jnci/22.4.719