Wilcoxon Rank Sum (Mann Whitney-U) in R

Overview

Wilcoxon rank sum test, or equivalently, Mann-Whitney U-test is a rank based non-parametric method. The aim is to compare two independent groups of observations. Under certain scenarios, it can be thought of as a test for median differences, however this is only valid when: 1) both samples are independent and identically distributed (same dispersion, same shape, not necessarily normal) and 2) are symmetric around their medians.

Generally, with two samples of observations (A and B), the test uses the mean of each possible pair of observations in each group (including the pair of each value with itself) to test if the probability that (A>B) > probability (B>A).

The Wilcoxon rank sum test is often presented alongside a Hodges-Lehmann estimate of the pseudo-median (the median of the Walsh averages), and an associated confidence interval for the pseudo-median.

A tie in the data exists when an observation in group A, has the same result as an observation in group B.

Available R packages

There are two main implementations of the Wilcoxon rank sum test in R.

The stats package implements various classic statistical tests, including Wilcoxon rank sum test. Although this is arguably the most commonly applied package, this one does not account for any ties in the data. To account for ties in the data, the coin package should be used.

# x, y are two unpaired vectors. Do not necessary need to be of the same length.
stats::wilcox.test(x, y, paired = FALSE)

Example: Birth Weight

Data source: Table 30.4, Kirkwood BR. and Sterne JAC. Essentials of medical statistics. Second Edition. ISBN 978-0-86542-871-3

Comparison of birth weights (kg) of children born to 15 non-smokers with those of children born to 14 heavy smokers.

# bw_ns: non smokers
# bw_s: smokers
bw_ns <- c(3.99, 3.89, 3.6, 3.73, 3.31, 
            3.7, 4.08, 3.61, 3.83, 3.41, 
            4.13, 3.36, 3.54, 3.51, 2.71)
bw_s <- c(3.18, 2.74, 2.9, 3.27, 3.65, 
           3.42, 3.23, 2.86, 3.6, 3.65, 
           3.69, 3.53, 2.38, 2.34)

Can visualize the data on two histograms. Red lines indicate the location of medians.

par(mfrow =c(1,2))
hist(bw_ns, main = 'Birthweight: non-smokers')
abline(v = median(bw_ns), col = 'red', lwd = 2)
hist(bw_s, main = 'Birthweight: smokers')
abline(v = median(bw_s), col = 'red', lwd = 2)

It is possible to see that for non-smokers, the median birthweight is higher than those of smokers. Now we can formally test it with wilcoxon rank sum test.

The default test is two-sided with confidence level of 0.95, and does continuity correction.

# default is two sided
stats::wilcox.test(bw_s, bw_ns, paired = FALSE, conf.int = TRUE)
Warning in wilcox.test.default(bw_s, bw_ns, paired = FALSE, conf.int = TRUE):
cannot compute exact p-value with ties
Warning in wilcox.test.default(bw_s, bw_ns, paired = FALSE, conf.int = TRUE):
cannot compute exact confidence intervals with ties

    Wilcoxon rank sum test with continuity correction

data:  bw_s and bw_ns
W = 45.5, p-value = 0.01001
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.76995896 -0.09000999
sample estimates:
difference in location 
            -0.4261377 

We can also carry out a one-sided test, by specifying alternative = greater (if the first item is greater than the second).

# default is two sided
stats::wilcox.test(bw_ns, bw_s, paired = FALSE, alternative = 'greater')
Warning in wilcox.test.default(bw_ns, bw_s, paired = FALSE, alternative =
"greater"): cannot compute exact p-value with ties

    Wilcoxon rank sum test with continuity correction

data:  bw_ns and bw_s
W = 164.5, p-value = 0.005003
alternative hypothesis: true location shift is greater than 0

In {stats} the exact p-value is computed when there are less than 50 values and no ties otherwise the normal approximation is used. In this cause, because there are ties the normal approximation is used.

# force exact
stats::wilcox.test(bw_s, bw_ns, paired = FALSE, conf.int = TRUE, exact = TRUE)
Warning in wilcox.test.default(bw_s, bw_ns, paired = FALSE, conf.int = TRUE, :
cannot compute exact p-value with ties
Warning in wilcox.test.default(bw_s, bw_ns, paired = FALSE, conf.int = TRUE, :
cannot compute exact confidence intervals with ties

    Wilcoxon rank sum test with continuity correction

data:  bw_s and bw_ns
W = 45.5, p-value = 0.01001
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.76995896 -0.09000999
sample estimates:
difference in location 
            -0.4261377 

In order to account for the ties, wilcox_test from the {coin} package should be used. For this function, the data needs to be inputted via a formula where the right hand side is a factor, so we need to create a dataset. In order to get smokers - non-smokers we need to relevel the factors.

smk_data <- data.frame(
  value = c(bw_ns, bw_s), 
  smoke = as.factor(rep(c("non", "smoke"), c(length(bw_ns), length(bw_s))))
) 

smk_data$smoke
 [1] non   non   non   non   non   non   non   non   non   non   non   non  
[13] non   non   non   smoke smoke smoke smoke smoke smoke smoke smoke smoke
[25] smoke smoke smoke smoke smoke
Levels: non smoke
smk_data$smoke <- forcats::fct_relevel(smk_data$smoke, "smoke")

smk_data$smoke
 [1] non   non   non   non   non   non   non   non   non   non   non   non  
[13] non   non   non   smoke smoke smoke smoke smoke smoke smoke smoke smoke
[25] smoke smoke smoke smoke smoke
Levels: smoke non

Now the data is in the right shape we can run wilcox_test. In order to get the Hodges-Lehmann confidence intervals out, we need to include the conf.int = TRUE argument.(NOTE: the conf.level argument controls the confidence level, but must be used with conf.int = TRUE otherwise you won’t get a confidence interval)

coin::wilcox_test(value ~ smoke, data = smk_data, conf.int  = TRUE)

    Asymptotic Wilcoxon-Mann-Whitney Test

data:  value by smoke (smoke, non)
Z = -2.5974, p-value = 0.009392
alternative hypothesis: true mu is not equal to 0
95 percent confidence interval:
 -0.76000001 -0.09999999
sample estimates:
difference in location 
            -0.4261403 

In {coin} there is no option for a continuity correction and it is not done by default. {coin} can calculate exact and Monte Carlo conditional p-values using the distribtuion argument. The exact p-value should be used in small sample sizes as the normal approximation does not hold

coin::wilcox_test(value ~ smoke, data = smk_data, conf.int  = TRUE,
                  distribution = "exact")

    Exact Wilcoxon-Mann-Whitney Test

data:  value by smoke (smoke, non)
Z = -2.5974, p-value = 0.008181
alternative hypothesis: true mu is not equal to 0
95 percent confidence interval:
 -0.76 -0.10
sample estimates:
difference in location 
                -0.425 

Note: the distribution argument only effects the p-value, {coin} consistently calculated the exact Hodges-Lehmann confidence intervals.

Useful References