<- matrix(c(26, 247, 44, 1002), ncol = 2)
bronch row.names(bronch) <- c("cough", "no cough")
colnames(bronch) <- c("bronchitis", "no bronchitis")
R/SAS Chi-Squared and Fisher’s Exact Comparision
Chi-Squared Test
Chi-Squared test is a hypothesis test for independent contingency tables, dependent on rows and column totals. The test assumes:
observations are independent of each other
all values are 1 or more and at least 80% of the cells are greater than 5.
data should be categorical
The Chi-Squared statistic is found by:
\[ \chi^2=\frac{\sum(O-E)^2}{E} \]
Where O is the observed and E is the expected.
For an r x c table (where r is the number of rows and c the number of columns), the Chi-squared distribution’s degrees of freedom is (r-1)*(c-1). The resultant statistic with correct degrees of freedom follows this distribution when its expected values are aligned with the assumptions of the test, under the null hypothesis. The resultant p value informs the magnitude of disagreement with the null hypothesis and not the magnitude of association
For this example we will use data about cough symptoms and history of bronchitis.
To a chi-squared test in R you will use the following code.
chisq.test(bronch)
Pearson's Chi-squared test with Yates' continuity correction
data: bronch
X-squared = 11.145, df = 1, p-value = 0.0008424
To run a chi-squared test in SAS you used the following code.
proc freq data=proj1.bronchitis;
tables Cough*Bronchitis / chisq;
run;
The result in the “Chi-Square” section of the results table in SAS will not match R, in this case it will be 12.1804 with a p-value of 0.0005. This is because by default R does a Yate’s continuity adjustment. To change this set correct
to false.
chisq.test(bronch, correct = FALSE)
Pearson's Chi-squared test
data: bronch
X-squared = 12.18, df = 1, p-value = 0.0004829
Alternatively, SAS also produces the correct chi-square value by default. It is the “Continuity Adj. Chi-Square” value in the results table.
Fisher’s Exact Test
Comparison between the Fisher’s Exact Test in both R and SAS shows that the two software match on the p-value and confidence intervals. The odd ratio does not match. The reason the odds ratio does not match is because R uses an “exact” odds ratio based on the hypergeomtric distribution, while SAS uses a standard AD/BC odds ratio. Note that R always uses an “exact” Fisher test. Therefore, when trying to match SAS, you must use the “exact” statement on the PROC FREQ.