R vs SAS CMH

Cochran-Mantel-Haenszel Test

The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.

Naming Convention

For the remainder of this document, we adopt the following naming convention when referring to variables of a contingency table:

  • X = exposure (Often the treatment variable)

  • Y = response (the variable of interest)

  • K = control (often a potential confounder you want to control for)

Scale

The scale of the exposure (X) and response (Y) variables dictate which test statistic is computed for the contingency table. Each test statistic is evaluated on different degrees of freedom (df):

  • General association statistic (X and Y both nominal) results in (X-1) * (Y-1) dfs

  • Row mean scores statistic (X is nominal and Y is ordinal) results in X-1 dfs

  • Nonzero correlation statistic (X and Y both ordinal) results in 1 df

Testing Strategy

Data

To begin investigating the differences in the SAS and R implementations of the CMH test, we decided to use the CDISC Pilot data set, which is publicly available on the PHUSE Test Data Factory repository. We applied very basic filtering conditions upfront (see below) and this data set served as the basis of the examples to follow.

  X      STUDYID SITEID SITEGR1     USUBJID     TRTSDT     TRTEDT
1 1 CDISCPILOT01    701     701 01-701-1015 2014-01-02 2014-07-02
2 2 CDISCPILOT01    701     701 01-701-1023 2012-08-05 2012-09-01
3 3 CDISCPILOT01    701     701 01-701-1028 2013-07-19 2014-01-14
4 4 CDISCPILOT01    701     701 01-701-1033 2014-03-18 2014-03-31
5 5 CDISCPILOT01    701     701 01-701-1034 2014-07-01 2014-12-30
6 6 CDISCPILOT01    701     701 01-701-1047 2013-02-12 2013-03-09
                  TRTP TRTPN AGE AGEGR1 AGEGR1N  RACE RACEN SEX ITTFL EFFFL
1              Placebo     0  63    <65       1 WHITE     1   F     Y     Y
2              Placebo     0  64    <65       1 WHITE     1   M     Y     Y
3 Xanomeline High Dose    81  71  65-80       2 WHITE     1   M     Y     Y
4  Xanomeline Low Dose    54  74  65-80       2 WHITE     1   M     Y     Y
5 Xanomeline High Dose    81  77  65-80       2 WHITE     1   F     Y     Y
6              Placebo     0  85    >80       3 WHITE     1   F     Y     Y
  COMP24FL AVISIT AVISITN             VISIT VISITNUM ADY        ADT  PARAMCD
1        Y Week 8       8            WEEK 8        8  63 2014-03-05 CIBICVAL
2        N Week 8       8            WEEK 4        5  29 2012-09-02 CIBICVAL
3        Y Week 8       8            WEEK 8        8  54 2013-09-10 CIBICVAL
4        N Week 8       8            WEEK 4        5  28 2014-04-14 CIBICVAL
5        Y Week 8       8            WEEK 8        8  57 2014-08-26 CIBICVAL
6        N Week 8       8 AMBUL ECG REMOVAL        6  46 2013-03-29 CIBICVAL
        PARAM PARAMN AVAL ANL01FL DTYPE AWRANGE AWTARGET AWTDIFF AWLO AWHI  AWU
1 CIBIC Score      1    4       Y    NA    2-84       56       7    2   84 DAYS
2 CIBIC Score      1    3       Y    NA    2-84       56      27    2   84 DAYS
3 CIBIC Score      1    4       Y    NA    2-84       56       2    2   84 DAYS
4 CIBIC Score      1    4       Y    NA    2-84       56      28    2   84 DAYS
5 CIBIC Score      1    4       Y    NA    2-84       56       1    2   84 DAYS
6 CIBIC Score      1    4       Y    NA    2-84       56      10    2   84 DAYS
  QSSEQ
1  6001
2  6001
3  6001
4  6001
5  6001
6  6001

Schemes

In order to follow a systematic approach to testing, and to cover variations in the CMH test, we considered the traditional 2 x 2 x K design as well as scenarios where the generalized CMH test is employed (e.g. 5 x 3 x 3).

We present 5 archetype test scenarios that illustrate diverging results, possibly related to sparse data and possibly considered edge cases.

Number Schema (XxYxK) Variables Relevant Test Description
1 2x2x2 X = TRTP, Y = SEX, K = AGEGR1 General Association TRTP and AGEGR1 were limited to two categories (removing the low dose and >80 year group), overall the the groups were rather balanced
2 3x2x3 X = TRTP, Y = SEX, K = AGEGR1 General Association TRTP and AGEGR1 each have 3 levels, SEX has 2 levels, overall the the groups were rather balanced
3 2x2x3 X = TRTP, Y = SEX, K = RACE General Association Gives back NaN in R because RACE is very imbalanced
6 2x5x2 X = TRTP, Y = AVAL, K = SEX Row Means Compare Row Means results for R and SAS because Y is ordinal
9 3x5x17 X = TRTP, Y = AVAL, K = SITEID Row Means SITEID has many strata and provokes sparse groups, AVAL is ordinal, therefore row means statistic applies here, R threw an error
10 5x3x3 X = AVAL, Y = AGEGR1, K = TRTP Correlation X and Y are ordinal variables and therefore the correlation statistics has to be taken here

Results

Here the results can be seen:

CMH Statistics

Loading required package: vcd
Loading required package: grid
Loading required package: gnm

Attaching package: 'vcdExtra'
The following object is masked from 'package:dplyr':

    summarise

scenarios this is a test

As it can be seen, there are two schemata where R does not provide any results:

Test
Chi-Square
df
p-value
SAS R SAS R SAS R
1 Correlation 0.21660000 0.2165549886 1 1 0.61700000 0.6416775
Row Means 0.21660000 0.2165549886 1 1 0.61700000 0.6416775
General Association 0.21660000 0.2165549886 1 1 0.61700000 0.6416775
2 Correlation 0.00090000 0.0008689711 1 1 0.97650000 0.9764831
Row Means 2.48200000 2.4820278527 1 2 0.28910000 0.2890910
General Association 2.48200000 2.4820278527 1 2 0.28910000 0.2890910
3* Correlation 0.00278713 NaN 1 1 0.95789662 NaN
Row Means 2.38606985 NaN 2 2 0.30329938 NaN
General Association 2.38606985 NaN 2 2 0.30329938 NaN
6 Correlation 1.14720000 0.1115439738 1 1 0.28410000 0.7383931
Row Means 1.14720000 2.6632420358 1 2 0.28410000 0.2640489
General Association 2.56720000 6.5238474681 4 8 0.63260000 0.5887637
9 Correlation 0.08544312 NA 1 NA 0.77005225 NA
Row Means 2.47631367 NA 2 NA 0.28991809 NA
General Association 7.03387844 NA 8 NA 0.53298189 NA
10 Correlation 2.73816085 0.8715295423 1 1 0.09797747 0.3505322
Row Means 4.40701092 3.0445270087 4 4 0.35371641 0.5504018
General Association 5.73053819 5.7305381934 8 8 0.67738613 0.6773861
* Reason for NaN in schema 3: Stratum k = AMERICAN INDIAN OR ALASKA NATIVE can not be compared because there are only values for one treatment and one gender.
Reason for Error 4: For large sparse table (many strata) CMHTest will occasionally throw an error in solve.default(AVA) because of singularity

Summary and Recommendation

Having explored the available R packages to calculate the CMH statistics, using R can only be recommended if the analysis design is equivalent to 2 x 2 x K. Then, the base mantelhaen.test() function as well as the vcdExtra package show reliable results which are equal to the output of the SAS function. The same is true for the common odds ratio even though there is a marked difference in decimals.

For the generalized version of the cmh test no R package can be recommended so far. SAS and R outputs differ substantially (possibly due to the underlying subroutines or functions) and the vcdExtra package seems to deliver inconsistent results outside the 2 x 2 x K design.

References

Accessible Summary: https://online.stat.psu.edu/stat504/lesson/4/4.4

An Introduction to Categorical Data Analysis 2nd Edition (Agresti): http://users.stat.ufl.edu/~aa/

SAS documentation (Specification): https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_freq_examples07.html

SAS documentation (Theoretical Basis + Formulas): https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/procstat/procstat_freq_details92.html

Original Paper 1: https://doi.org/10.2307%2F3001616

Original Paper 2: https://doi.org/10.1093/jnci/22.4.719