R/SAS Chi-Squared and Fisher’s Exact Comparision

Chi-Squared Test

Chi-Squared test is a hypothesis test for independent contingency tables, dependent on rows and column totals. The test assumes:

  • observations are independent of each other

  • all values are 1 or more and at least 80% of the cells are greater than 5.

  • data should be categorical

The Chi-Squared statistic is found by:

\[ \chi^2=\frac{\sum(O-E)^2}{E} \]

Where O is the observed and E is the expected.
For an r x c table (where r is the number of rows and c the number of columns), the Chi-squared distribution’s degrees of freedom is (r-1)*(c-1). The resultant statistic with correct degrees of freedom follows this distribution when its expected values are aligned with the assumptions of the test, under the null hypothesis. The resultant p value informs the magnitude of disagreement with the null hypothesis and not the magnitude of association

For this example we will use data about cough symptoms and history of bronchitis.

bronch <- matrix(c(26, 247, 44, 1002), ncol = 2)
row.names(bronch) <- c("cough", "no cough")
colnames(bronch) <- c("bronchitis", "no bronchitis")

To a chi-squared test in R you will use the following code.

chisq.test(bronch)

    Pearson's Chi-squared test with Yates' continuity correction

data:  bronch
X-squared = 11.145, df = 1, p-value = 0.0008424

To run a chi-squared test in SAS you used the following code.

proc freq data=proj1.bronchitis;
tables Cough*Bronchitis / chisq;
run;

The result in the “Chi-Square” section of the results table in SAS will not match R, in this case it will be 12.1804 with a p-value of 0.0005. This is because by default R does a Yate’s continuity adjustment. To change this set correct to false.

chisq.test(bronch, correct = FALSE)

    Pearson's Chi-squared test

data:  bronch
X-squared = 12.18, df = 1, p-value = 0.0004829

Alternatively, SAS also produces the correct chi-square value by default. It is the “Continuity Adj. Chi-Square” value in the results table.

Fisher’s Exact Test

Comparison between the Fisher’s Exact Test in both R and SAS shows that the two software match on the p-value and confidence intervals. The odd ratio does not match. The reason the odds ratio does not match is because R uses an “exact” odds ratio based on the hypergeomtric distribution, while SAS uses a standard AD/BC odds ratio. Note that R always uses an “exact” Fisher test. Therefore, when trying to match SAS, you must use the “exact” statement on the PROC FREQ.