R: Kolmogorov-Smirnov test

Introduction

Kolmogorov-Smirnov (K-S) test is a non-parametric test employed to check whether the probability distributions of a sample and a control distribution, or two samples are equal. It is constructed based on the cumulative distribution function (CDF) and calculates the greatest difference between the empirical distribution function (EDF) of the sample and the theoretical or empirical distribution of the control sample.

The Kolmogorov-Smirnov test is mostly used for two purposes:

One-sample K-S test: To compare the sample distribution to a known reference distribution.
Two-sample K-S test: To compare the two independent samples’ distributions.

The K-S test is formulated on the basis of the maximum difference between the observed and expected cumulative distribution functions (CDFs). The test is non-parametric, as it does not assume any specific distribution for the sample data. This makes it especially helpful in testing the goodness-of-fit for continuous distributions.

Libraries or Extensions Needed

To perform the Kolmogorov-Smirnov test in R, we will use the ks.test() function from the dgof package.

library(dgof)

Data Sources for the Analysis

We will use the lung dataset from the survival package.

library(survival)
attach(lung)

Details about the lung dataset can be found in the documentation for the survival package, which is available at https://cran.r-project.org/web/packages/survival/survival.pdf.

Statistical Method

One-sample K-S test

For this example, we will test whether the Karnofsky performance score rated by physician (ph.karno), and Karnofsky performance score rated by patient (pat.karno) follow a normal distribution.

ks.test(ph.karno, "pnorm")
ks.test(pat.karno, "pnorm")

Both tests have p-values < 2.2e-16, which indicates that the distributions of ph.karno and pat.karno are significantly different from a normal distribution.

Two-sample K-S test

Next, we will compare the distributions of ph.karno and pat.karno using the two-sample K-S test.

ks.test(ph.karno, pat.karno)

The p-value of 0.2084 suggests that there is no significant difference between the distributions of ph.karno and pat.karno. This indicates that the Karnofsky performance scores rated by physicians and patients are not significantly different in terms of their distribution.

Conclusion

We demonstrated the use of the Kolmogorov-Smirnov test in R using the ks.test() function from the dgof package, which is straightforward and handy to use. As far as we are aware of, this is the most widely used function in R to perform the Kolmogorov-Smirnov test.

Session Info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       Ubuntu 24.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/London
 date     2026-03-17
 pandoc   3.6.3 @ /home/michael/.positron-server/bin/f3aae65e0a1a11d39226cd884520f49301daef82/quarto/bin/tools/x86_64/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package  * version date (UTC) lib source
 R dgof       <NA>    <NA>       [?] <NA>
   survival   3.8-3   2024-12-17 [2] CRAN (R 4.5.2)

 [1] /home/michael/source/personal/CAMIS/renv/library/linux-ubuntu-noble/R-4.5/x86_64-pc-linux-gnu
 [2] /opt/R/4.5.2/lib/R/library

 R ── Package was removed from disk.

──────────────────────────────────────────────────────────────────────────────