The Kruskal-Wallis test is a non-parametric equivalent to the one-way ANOVA. For this example, the data used is a subset of the iris dataset, testing for difference in sepal width between species of flower.
The Kruskal-Wallis test can be implemented in Python using the kruskal function from scipy.stats. The null hypothesis is that the samples are from identical populations.
from scipy.stats import kruskal# Separate the data for each speciessetosa_data = iris_sub[iris_sub['Species'] =='setosa']['Sepal_Width']versicolor_data = iris_sub[iris_sub['Species'] =='versicolor']['Sepal_Width']virginica_data = iris_sub[iris_sub['Species'] =='virginica']['Sepal_Width']# Perform the Kruskal-Wallis H-testh_statistic, p_value = kruskal(setosa_data, versicolor_data, virginica_data)# Calculate the degrees of freedomk =len(iris_sub['Species'].unique())df = k -1print("H-statistic:", h_statistic)print("p-value:", p_value)print("Degrees of freedom:", df)
H-statistic: 10.922233820459285
p-value: 0.0042488075570347485
Degrees of freedom: 2
Results
As seen above, Python outputs the Kruskal-Wallis rank sum statistic (10.922), the degrees of freedom (2), and the p-value of the test (0.004249). Therefore, the difference in population medians is statistically significant at the 5% level.