R vs SAS Wilcoxon Rank-Sum Test

Introduction

This page compares the Wilcoxon rank-sum test, Hodges-Lehmann estimator, and estimation of the Mann-Whitney parameter in R and SAS.

Example Data

For this example we are using a dataset of birth weights for smoking and non-smoking mothers (Data source: Table 30.4, Kirkwood BR. and Sterne JAC. Essentials of medical statistics. Second Edition. ISBN 978-0-86542-871-3). This dataset is both small (so an exact test is recommended) and has ties in it.

bw_ns <- c(3.99, 3.89, 3.6, 3.73, 3.31, 
            3.7, 4.08, 3.61, 3.83, 3.41, 
            4.13, 3.36, 3.54, 3.51, 2.71)
bw_s <- c(3.18, 2.74, 2.9, 3.27, 3.65, 
           3.42, 3.23, 2.86, 3.6, 3.65, 
           3.69, 3.53, 2.38, 2.34)

smk_data <- data.frame(
  value = c(bw_ns, bw_s), 
  smoke = as.factor(rep(c("non", "smoke"), c(length(bw_ns), length(bw_s))))
) 
# Relevel the factors to make it smoker - non-smokers 
smk_data$smoke <- forcats::fct_relevel(smk_data$smoke, "smoke")
head(smk_data)
  value smoke
1  3.99   non
2  3.89   non
3  3.60   non
4  3.73   non
5  3.31   non
6  3.70   non

To view the code implementations, see the SAS and R pages, respectively.

Comparison

Software Capabilities

The following table provides an overview of the supported analyses between R and SAS. A specific comparison of the results and whether they match are provided below.

Analysis Supported in R {stats} Supported in R {coin} Supported in R {asht} Supported in SAS Notes
Wilcoxon Rank-Sum – Normal approximation with continuity correction Yes No Yes Yes In {coin}, one can add correct=TRUE, but note that no error is given and the results of a normal approximation approach without continuity correction are provided.
Wilcoxon Rank-Sum – Normal approximation without continuity correction Yes Yes Yes Yes
Wilcoxon Rank-Sum – Exact Partly Yes Partly Yes In {stats}, one can only do the exact method when no ties are present.; In {asht}, exact test is possible but the run time is long for larger sample size.
Wilcoxon Rank-Sum – Approximative (Monte Carlo simulation) No Yes Yes No
Hodges-Lehmann estimator – Asymptotic Yes No No Yes
Hodges-Lehmann estimator – Exact Partly Yes No Yes In {stats}, one can only do the exact method when no ties are present.
Hodges-Lehmann estimator – Approximative (Monte Carlo simulation) No Yes No No
Mann-Whitney parameter No No Yes No In {asht}, confidence intervals can be obtained using asymptotic approximation, Monte Carlo simulations, or exact methods (for small sample size)

Wilcoxon Rank Sum test

In the below table the p-values of the Wilcoxon Rank Sum Test with different options are compared.

Analysis R {stats} R {coin} R {asht} SAS Match Notes
Wilcoxon Rank-Sum – Normal approximation with continuity correction 0.0100 / 0.0100 0.0100 Yes Not possible with {coin}
Wilcoxon Rank-Sum – Normal approximation without continuity correction 0.0094 0.0094 0.0094 0.0094 Yes
Wilcoxon Rank-Sum – Exact / 0.0082 / 0.0082 Yes Not possible with {stats} since there are ties.; In {asht} run-time very long.
Wilcoxon Rank-Sum – Approximative (Monte Carlo simulation) / 0.0083 0.0083 / Yes With 100,000 simulations

Hodges-Lehmann estimator

In the below table the Hodges-Lehmann estimate and 95% confidence intervals are compared.

Analysis R {stats} R {coin} R {asht} SAS Match Notes
Hodges-Lehmann estimator – Asymptotic -0.426 (-0.770 to -0.090) -0.426 (-0.760 to -0.100) / -0.425 (-0.770 to -0.090) No In {coin}, the CI is the exact CI. The CIs match between {stats} and SAS.
Hodges-Lehmann estimator – Exact / -0.425 (-0.760 to -0.100) / -0.425 (-0.760 to -0.100) Yes Not possible with {stats} since there are ties; In {asht} run-time very long.
Hodges-Lehmann estimator – Approximative (Monte Carlo simulation) / -0.425 (-0.760 to -0.100) / / / With 500,000 simulations

Mann-Whitney Parameter

The estimation of the Mann-Whitney parameter is only possible in R asht package.

Special considerations for one-sided p-values

It is important to note that in SAS you can get an unexpected one-sided p-value. In the SAS documentation for PROC NPAR1WAY it is stated that:

“PROC NPAR1WAY computes one-sided and two-sided asymptotic p-values for each two-sample linear rank test. When the test statistic z is greater than its null hypothesis expected value of 0, PROC NPAR1WAY computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. When the test statistic is less than or equal to 0, PROC NPAR1WAY computes the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis” (similar for the exact p-value).

Thus SAS reports the one-sided p-value in the direction of the test statistic. This can cause an unexpected one-sided p-value, if your data provides a test statistic in the other direction of the pre-specified one-sided hypothesis.

Consider the following data example to showcase this:

dat_used <- data.frame(
  ID = c("001", "002", "003", "004", "005", "006", "007", "008", "009", "010",
         "011", "012", "013", "014", "015", "016", "017", "018", "019", "020",
         "021", "022", "023", "024", "025", "026", "027", "028", "029", "030"),
  ARM = c("Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo",
          "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low",
          "High", "High", "High", "High", "High", "High", "High", "High", "High", "High"),
  Y = c(8.5, 8.9, 8.2, 8.1, 7.1, 7.4, 6.0, 6.5, 7.0, 7.0,
        6.5, 9.4, 8.9, 8.8, 9.6, 8.3, 8.9, 7.0, 9.1, 6.9,
        8.0, 7.3, 7.1, 6.2, 4.7, 4.7, 4.2, 4.1, 3.4, 3.9)
)
dat_used
    ID     ARM   Y
1  001 Placebo 8.5
2  002 Placebo 8.9
3  003 Placebo 8.2
4  004 Placebo 8.1
5  005 Placebo 7.1
6  006 Placebo 7.4
7  007 Placebo 6.0
8  008 Placebo 6.5
9  009 Placebo 7.0
10 010 Placebo 7.0
11 011     Low 6.5
12 012     Low 9.4
13 013     Low 8.9
14 014     Low 8.8
15 015     Low 9.6
16 016     Low 8.3
17 017     Low 8.9
18 018     Low 7.0
19 019     Low 9.1
20 020     Low 6.9
21 021    High 8.0
22 022    High 7.3
23 023    High 7.1
24 024    High 6.2
25 025    High 4.7
26 026    High 4.7
27 027    High 4.2
28 028    High 4.1
29 029    High 3.4
30 030    High 3.9

Suppose we would have the following two hypothesis, where for both Low Dose and High Dose we expect smaller values (Y) than Placebo:

  • \(H_{0}\): No difference between Placebo and Low Dose, vs \(H_{1}\): Placebo has higher values (Y) than Low Dose

  • \(H_{0}\): No difference between Placebo and High Dose, vs \(H_{1}\): Placebo has higher values (Y) than High Dose

Asymptotic results without continuity correction

Placebo and High Dose group

Let us the {coin} package in R to compare the Placebo and High Dose group:

# Note: greater implies that H1 is Y1 - Y2 = Placebo - High > 0
coin::wilcox_test(
  Y ~ factor(ARM, levels = c("Placebo", "High")),
  distribution = "asymptotic",
  alternative = "greater",
  data = dat_used %>% dplyr::filter(ARM %in% c("Placebo", "High")))

    Asymptotic Wilcoxon-Mann-Whitney Test

data:  Y by
     factor(ARM, levels = c("Placebo", "High")) (Placebo, High)
Z = 2.5352, p-value = 0.005619
alternative hypothesis: true mu is greater than 0

In SAS, the following results is obtained. As can be seen in both R and SAS the one-sided p-value is 0.0056.

Placebo and Low Dose group

Let us the {coin} package in R to compare the Placebo and Low Dose group:

# Note: greater implies that H1 is Y1 - Y2 = Placebo - High > 0
coin::wilcox_test(
  Y ~ factor(ARM, levels = c("Placebo", "Low")),
  distribution = "asymptotic",
  alternative = "greater",
  data = dat_used %>% dplyr::filter(ARM %in% c("Placebo", "Low")))

    Asymptotic Wilcoxon-Mann-Whitney Test

data:  Y by
     factor(ARM, levels = c("Placebo", "Low")) (Placebo, Low)
Z = -1.7066, p-value = 0.9561
alternative hypothesis: true mu is greater than 0

In SAS, the following results is obtained. The one-sided p-values clearly do not match ({coin} p-value = 0.9561; SAS p-value = 0.0439). As mentioned above, SAS reports the p-value in the direction of the test statistic. This can cause an unexpected one-sided p-value, if your data provides a test statistic in the other directiont than the pre-specified one-sided hypothesis. Do note that \(1 - 0.9561 = 0.0439\).

Exact results

Placebo and High Dose group

Let us the {coin} package in R to compare the Placebo and High Dose group:

# Note: greater implies that H1 is Y1 - Y2 = Placebo - High > 0
coin::wilcox_test(
  Y ~ factor(ARM, levels = c("Placebo", "High")),
  distribution = "exact",
  alternative = "greater",
  data = dat_used %>% dplyr::filter(ARM %in% c("Placebo", "High")))

    Exact Wilcoxon-Mann-Whitney Test

data:  Y by
     factor(ARM, levels = c("Placebo", "High")) (Placebo, High)
Z = 2.5352, p-value = 0.004682
alternative hypothesis: true mu is greater than 0

In SAS (see above), the same one-sided p-value of 0.0047 is obtained.

Placebo and Low Dose group

Let us the {coin} package in R to compare the Placebo and Low Dose group:

# Note: greater implies that H1 is Y1 - Y2 = Placebo - High > 0
coin::wilcox_test(
  Y ~ factor(ARM, levels = c("Placebo", "Low")),
  distribution = "exact",
  alternative = "greater",
  data = dat_used %>% dplyr::filter(ARM %in% c("Placebo", "Low")))

    Exact Wilcoxon-Mann-Whitney Test

data:  Y by
     factor(ARM, levels = c("Placebo", "Low")) (Placebo, Low)
Z = -1.7066, p-value = 0.9574
alternative hypothesis: true mu is greater than 0

Please see above for the SAS result. The one-sided p-values clearly do not match ({coin} p-value = 0.9574; SAS p-value = 0.0455).

Summary and Recommendation

Wilcoxon Rank Sum test and the associated Hodges-Lehmann CI are able to be consistently computed in both SAS and R. The user needs to be aware of some small differences:

  • In SAS the exact wilcoxon hl statement is needed to get both the exact p-value and CI.

  • In {stats} exact values are only possible when there are no ties and the exact parameter is set to true (exact = TRUE). This will give the exact p-value and CI.

  • In {coin} it is not possible to do a normal approximation with continuity correction.

  • For the asymptotic Hodges-Lehmann estimator, {stats} and {coin} use an algorithm to define the estimate, whereas SAS provides the traditional Hodges-Lehmann estimator.

If you have a study where you would like to use R for the exact Wilcoxon Rank Sum test and there is the risk of ties, {coin} would be recommended.

Ties

In all presented R packages and SAS, when there are tied values, the average score method (mid-ranks) is used. This is done by first sorting the observations in ascending order and assigning ranks as if there were no ties. The procedure averages the scores for tied observations and assigns this average score to each of the tied observations. Thus, all tied data values have the same score value.

Additional References

Provided are references and additional reading materials for both R and SAS documentation related to the analysis.

R Documentation:

SAS Documentation:

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       Ubuntu 24.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       UTC
 date     2025-09-15
 pandoc   3.6.3 @ /opt/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package * version date (UTC) lib source
 P coin    * 1.4-3   2023-09-27 [?] RSPM (R 4.4.0)

 [1] /home/runner/work/CAMIS/CAMIS/renv/library/linux-ubuntu-noble/R-4.4/x86_64-pc-linux-gnu
 [2] /opt/R/4.4.2/lib/R/library

 P ── Loaded and on-disk path mismatch.

─ External software ──────────────────────────────────────────────────────────
 setting value
 SAS     9.04.01M7P080520

──────────────────────────────────────────────────────────────────────────────