R vs SAS CMH

Cochran-Mantel-Haenszel Test

The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.

Naming Convention

For the remainder of this document, we adopt the following naming convention when referring to variables of a contingency table:

X = exposure (Often the treatment variable)
Y = response (the variable of interest)
K = control (often a potential confounder you want to control for)

Scale

The scale of the exposure (X) and response (Y) variables dictate which test statistic is computed for the contingency table. Each test statistic is evaluated on different degrees of freedom (df):

General association statistic (X and Y both nominal) results in (X-1) * (Y-1) dfs
Row mean scores statistic (X is nominal and Y is ordinal) results in X-1 dfs
Nonzero correlation statistic (X and Y both ordinal) results in 1 df

Testing Strategy

Data

To begin investigating the differences in the SAS and R implementations of the CMH test, we decided to use the CDISC Pilot data set, which is publicly available on the PHUSE Test Data Factory repository. We applied very basic filtering conditions upfront (see below) and this data set served as the basis of the examples to follow.

  X      STUDYID SITEID SITEGR1     USUBJID     TRTSDT     TRTEDT
1 1 CDISCPILOT01    701     701 01-701-1015 2014-01-02 2014-07-02
2 2 CDISCPILOT01    701     701 01-701-1023 2012-08-05 2012-09-01
3 3 CDISCPILOT01    701     701 01-701-1028 2013-07-19 2014-01-14
4 4 CDISCPILOT01    701     701 01-701-1033 2014-03-18 2014-03-31
5 5 CDISCPILOT01    701     701 01-701-1034 2014-07-01 2014-12-30
6 6 CDISCPILOT01    701     701 01-701-1047 2013-02-12 2013-03-09
                  TRTP TRTPN AGE AGEGR1 AGEGR1N  RACE RACEN SEX ITTFL EFFFL
1              Placebo     0  63    <65       1 WHITE     1   F     Y     Y
2              Placebo     0  64    <65       1 WHITE     1   M     Y     Y
3 Xanomeline High Dose    81  71  65-80       2 WHITE     1   M     Y     Y
4  Xanomeline Low Dose    54  74  65-80       2 WHITE     1   M     Y     Y
5 Xanomeline High Dose    81  77  65-80       2 WHITE     1   F     Y     Y
6              Placebo     0  85    >80       3 WHITE     1   F     Y     Y
  COMP24FL AVISIT AVISITN             VISIT VISITNUM ADY        ADT  PARAMCD
1        Y Week 8       8            WEEK 8        8  63 2014-03-05 CIBICVAL
2        N Week 8       8            WEEK 4        5  29 2012-09-02 CIBICVAL
3        Y Week 8       8            WEEK 8        8  54 2013-09-10 CIBICVAL
4        N Week 8       8            WEEK 4        5  28 2014-04-14 CIBICVAL
5        Y Week 8       8            WEEK 8        8  57 2014-08-26 CIBICVAL
6        N Week 8       8 AMBUL ECG REMOVAL        6  46 2013-03-29 CIBICVAL
        PARAM PARAMN AVAL ANL01FL DTYPE AWRANGE AWTARGET AWTDIFF AWLO AWHI  AWU
1 CIBIC Score      1    4       Y    NA    2-84       56       7    2   84 DAYS
2 CIBIC Score      1    3       Y    NA    2-84       56      27    2   84 DAYS
3 CIBIC Score      1    4       Y    NA    2-84       56       2    2   84 DAYS
4 CIBIC Score      1    4       Y    NA    2-84       56      28    2   84 DAYS
5 CIBIC Score      1    4       Y    NA    2-84       56       1    2   84 DAYS
6 CIBIC Score      1    4       Y    NA    2-84       56      10    2   84 DAYS
  QSSEQ
1  6001
2  6001
3  6001
4  6001
5  6001
6  6001

Schemes

In order to follow a systematic approach to testing, and to cover variations in the CMH test, we considered the traditional 2 x 2 x K design as well as scenarios where the generalized CMH test is employed (e.g. 5 x 3 x 3).

We present 5 archetype test scenarios that illustrate diverging results, possibly related to sparse data and possibly considered edge cases.

Number	Schema (XxYxK)	Variables	Relevant Test	Description
1	2x2x2	X = TRTP, Y = SEX, K = AGEGR1	General Association	TRTP and AGEGR1 were limited to two categories (removing the low dose and >80 year group), overall the the groups were rather balanced
2	3x2x3	X = TRTP, Y = SEX, K = AGEGR1	General Association	TRTP and AGEGR1 each have 3 levels, SEX has 2 levels, overall the the groups were rather balanced
3	2x2x3	X = TRTP, Y = SEX, K = RACE	General Association	Gives back NaN in R because RACE is very imbalanced
6	2x5x2	X = TRTP, Y = AVAL, K = SEX	Row Means	Compare Row Means results for R and SAS because Y is ordinal
9	3x5x17	X = TRTP, Y = AVAL, K = SITEID	Row Means	SITEID has many strata and provokes sparse groups, AVAL is ordinal, therefore row means statistic applies here, R threw an error
10	5x3x3	X = AVAL, Y = AGEGR1, K = TRTP	Correlation	X and Y are ordinal variables and therefore the correlation statistics has to be taken here

Results

Here the results can be seen:

CMH Statistics

Loading required package: vcd

Loading required package: grid

Loading required package: gnm


Attaching package: 'vcdExtra'

The following object is masked from 'package:dplyr':

    summarise

scenarios this is a test

As it can be seen, there are two schemata where R does not provide any results:

	Test	Chi-Square		df		p-value
	Test	SAS	R	SAS	R	SAS	R
1	Correlation	0.21660000	0.2165549886	1	1	0.61700000	0.6416775
	Row Means	0.21660000	0.2165549886	1	1	0.61700000	0.6416775
	General Association	0.21660000	0.2165549886	1	1	0.61700000	0.6416775
2	Correlation	0.00090000	0.0008689711	1	1	0.97650000	0.9764831
	Row Means	2.48200000	2.4820278527	1	2	0.28910000	0.2890910
	General Association	2.48200000	2.4820278527	1	2	0.28910000	0.2890910
3^*	Correlation	0.00278713	NaN	1	1	0.95789662	NaN
	Row Means	2.38606985	NaN	2	2	0.30329938	NaN
	General Association	2.38606985	NaN	2	2	0.30329938	NaN
6	Correlation	1.14720000	0.1115439738	1	1	0.28410000	0.7383931
	Row Means	1.14720000	2.6632420358	1	2	0.28410000	0.2640489
	General Association	2.56720000	6.5238474681	4	8	0.63260000	0.5887637
9^†	Correlation	0.08544312	NA	1	NA	0.77005225	NA
	Row Means	2.47631367	NA	2	NA	0.28991809	NA
	General Association	7.03387844	NA	8	NA	0.53298189	NA
10	Correlation	2.73816085	0.8715295423	1	1	0.09797747	0.3505322
	Row Means	4.40701092	3.0445270087	4	4	0.35371641	0.5504018
	General Association	5.73053819	5.7305381934	8	8	0.67738613	0.6773861
^* Reason for NaN in schema 3: Stratum k = AMERICAN INDIAN OR ALASKA NATIVE can not be compared because there are only values for one treatment and one gender.
^† Reason for Error 4: For large sparse table (many strata) CMHTest will occasionally throw an error in solve.default(AVA) because of singularity

Summary and Recommendation

Having explored the available R packages to calculate the CMH statistics, using R can only be recommended if the analysis design is equivalent to 2 x 2 x K. Then, the base mantelhaen.test() function as well as the vcdExtra package show reliable results which are equal to the output of the SAS function. The same is true for the common odds ratio even though there is a marked difference in decimals.

For the generalized version of the cmh test no R package can be recommended so far. SAS and R outputs differ substantially (possibly due to the underlying subroutines or functions) and the vcdExtra package seems to deliver inconsistent results outside the 2 x 2 x K design.

References

Accessible Summary: https://online.stat.psu.edu/stat504/lesson/4/4.4

An Introduction to Categorical Data Analysis 2nd Edition (Agresti): http://users.stat.ufl.edu/~aa/

SAS documentation (Specification): https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_freq_examples07.html

SAS documentation (Theoretical Basis + Formulas): https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/procstat/procstat_freq_details92.html

Original Paper 1: https://doi.org/10.2307%2F3001616

Original Paper 2: https://doi.org/10.1093/jnci/22.4.719