Kruskal Wallis R

Introduction

The Kruskal-Wallis test is a non-parametric equivalent to the one-way ANOVA. For this example, the data used is a subset of datasets::iris, testing for difference in sepal width between species of flower.

      Species Sepal_Width
1      setosa         3.4
2      setosa         3.0
3      setosa         3.4
4      setosa         3.2
5      setosa         3.5
6      setosa         3.1
7  versicolor         2.7
8  versicolor         2.9
9  versicolor         2.7
10 versicolor         2.6
11 versicolor         2.5
12 versicolor         2.5
13  virginica         3.0
14  virginica         3.0
15  virginica         3.1
16  virginica         3.8
17  virginica         2.7
18  virginica         3.3

Implementing Kruskal-Wallis in R

The Kruskal-Wallis test can be implemented in R using stats::kruskal.test. Below, the test is defined using R’s formula interface (dependent ~ independent variable) and specifying the data set. The null hypothesis is that the samples are from identical populations.

kruskal.test(Sepal_Width~Species, data=iris_sub)

    Kruskal-Wallis rank sum test

data:  Sepal_Width by Species
Kruskal-Wallis chi-squared = 10.922, df = 2, p-value = 0.004249

Results

As seen above, R outputs the Kruskal-Wallis rank sum statistic (10.922), the degrees of freedom (2), and the p-value of the test (0.004249). Therefore, the difference in population medians is statistically significant at the 5% level.