
Survival Analysis Using R
The most commonly used survival analysis methods in clinical trials include:
Kaplan-Meier (KM) estimators: non-parametric statistics utilized for estimating the survival function
Log-rank test: a non-parametric test for comparing the survival functions across two or more groups
Cox proportional hazards (PH) model: a semi-parametric model often used to assess the relationship between the survival time and explanatory variables
Additionally, other methods for analyzing time-to-event data are available, such as:
Parametric survival model
Accelerated failure time model
Competing risk model
Restricted mean survival time
Time-dependent Cox model
While these models may be explored in a separate document, this particular document focuses solely on the three most prevalent methods: KM estimators, log-rank test and Cox PH model.
Analysis of Time-to-event Data
Below is a standard mock-up for survival analysis in clinical trials.
Example Data
Data source: https://stats.idre.ucla.edu/sas/seminars/sas-survival/
The data include 500 subjects from the Worcester Heart Attack Study. This study examined several factors, such as age, gender and BMI, that may influence survival time after heart attack. Follow up time for all participants begins at the time of hospital admission after heart attack and ends with death or loss to follow up (censoring). The variables used here are:
lenfol: length of followup, terminated either by death or censoring - time variable
fstat: loss to followup = 0, death = 1 - censoring variable
afb: atrial fibrillation, no = 0, 1 = yes - explanatory variable
gender: males = 0, females = 1 - stratification factor
library(tidyverse)
library(haven)
library(survival)
library(survminer)
library(ggsurvfit)
library(broom)
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
dat <- haven::read_sas(file.path("../data/whas500.sas7bdat")) |>
mutate(
LENFOLY = round(LENFOL / 365.25, 2), ## change follow-up days to years for better visualization
AFB = factor(AFB, levels = c(1, 0))
) ## change AFB order to use "Yes" as the reference group to be consistent with SASThe Non-stratified Model
First we try a non-stratified analysis following the mock-up above to describe the association between survival time and afb (atrial fibrillation).
The KM estimators are from survival::survfit function, the log-rank test uses survminer::surv_pvalue, and Cox PH model is conducted using survival::coxph function. Numerous R packages and functions are available for performing survival analysis. The author has selected survival and survminer for use in this context, but alternative options can also be employed for survival analysis.
KM estimators
fit.km <- survival::survfit(survival::Surv(LENFOLY, FSTAT) ~ AFB, data = dat)
## quantile estimates
quantile(fit.km, probs = c(0.25, 0.5, 0.75))$quantile
25 50 75
AFB=1 0.26 2.37 6.43
AFB=0 0.94 5.91 6.44
$lower
25 50 75
AFB=1 0.05 1.27 4.24
AFB=0 0.55 4.32 6.44
$upper
25 50 75
AFB=1 1.11 4.24 NA
AFB=0 1.47 NA NA
## landmark estimates at 1, 3, 5-year
summary(fit.km, times = c(1, 3, 5))Call: survfit(formula = survival::Surv(LENFOLY, FSTAT) ~ AFB, data = dat)
AFB=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 50 28 0.641 0.0543 0.543 0.757
3 27 12 0.455 0.0599 0.351 0.589
5 11 6 0.315 0.0643 0.211 0.470
AFB=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 312 110 0.739 0.0214 0.699 0.782
3 199 33 0.642 0.0245 0.595 0.691
5 77 20 0.530 0.0311 0.472 0.595
Log-rank test
There are multiple ways to output the log-rank test. The survdiff() function from {survival} package performs a log-rank test (or its weighted variants) to compare survival curves between two or more treatment groups. rho=0 is the default and gives the standard log-rank test. rho=1 would output the Peto-Peto test (which weights earliest events more heavily).
You can also use {survminer} package as shown below or {ggsurvfit} package using add_pvalue option if you want the p-value to be put into a KM plot - See example in Kaplan Meier section below.
#survdiff() from survival package: unrounded pvalue=0.0009646027
survdiff(Surv(LENFOLY, FSTAT) ~ AFB, data = dat, rho=0)Call:
survdiff(formula = Surv(LENFOLY, FSTAT) ~ AFB, data = dat, rho = 0)
N Observed Expected (O-E)^2/E (O-E)^2/V
AFB=1 78 47 30.3 9.26 10.9
AFB=0 422 168 184.7 1.52 10.9
Chisq= 10.9 on 1 degrees of freedom, p= 0.001
#surv_pvalue() from survminer
survminer::surv_pvalue(fit.km, data = dat) variable pval method pval.txt
1 AFB 0.0009646027 Log-rank p = 0.00096
Cox PH model
fit.cox <- survival::coxph(survival::Surv(LENFOLY, FSTAT) ~ AFB, data = dat)
fit.cox |>
tidy(exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) |>
select(term, estimate, conf.low, conf.high)# A tibble: 1 × 4
term estimate conf.low conf.high
<chr> <dbl> <dbl> <dbl>
1 AFB0 0.583 0.421 0.806
The Stratified Model
In a stratified model, the Kaplan-Meier estimators remain the same as those in the non-stratified model. To implement stratified log-rank tests and Cox proportional hazards models, simply include the strata() function within the model formula.
Stratified Log-rank test
fit.km.str <- survival::survfit(
survival::Surv(LENFOLY, FSTAT) ~ AFB + survival::strata(GENDER),
data = dat
)
survminer::surv_pvalue(fit.km.str, data = dat) variable pval method pval.txt
1 AFB+survival::strata(GENDER) 0.001506607 Log-rank p = 0.0015
Stratified Cox PH model
fit.cox.str <- survival::coxph(
survival::Surv(LENFOLY, FSTAT) ~ AFB + survival::strata(GENDER),
data = dat
)
fit.cox.str |>
tidy(exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) |>
select(term, estimate, conf.low, conf.high)# A tibble: 1 × 4
term estimate conf.low conf.high
<chr> <dbl> <dbl> <dbl>
1 AFB0 0.594 0.430 0.823
Kaplan-Meier Graphs
You can use {survminer} or {ggsurvfit} packages to create kaplan-meier graphs including presentation of the number at risk and number of events under the graph. Both methods are highly customizable.
It is good practice to ensure your categorical factors are specified as such and are clearly labelled. {forcats} package is useful for recoding factors as shown below using fct_recode().
{ggsurvfit} is shown here because the code coverage is higher for this package than for {survminer}.
The code below, fits the model, adds a log-rank test p-value, limits the X axis, controls the major scale and minor scale of Y and X axis, adds a risk table under the graph showing number at risk and the cumulative events, color codes the lines to allow easy identification of AFB and Gender and adds appropriate titles and axis labels.
dat2<- dat %>%
mutate(Treatment=fct_recode(AFB, 'Without AFB'='0','With AFB'='1')) %>%
mutate(GENDER_F = factor(GENDER, labels=c('Female','Male')))
survfit2(Surv(LENFOLY, FSTAT) ~ Treatment + strata(GENDER_F), data = dat2) %>%
ggsurvfit() +
add_pvalue(rho=0) +
coord_cartesian(xlim = c(0, 6)) +
scale_y_continuous(breaks = seq(0, 1, by = 0.1), minor_breaks=NULL) +
scale_x_continuous(breaks = seq(0, 6, by = 1), minor_breaks=NULL) +
add_risktable(risktable_stats='{n.risk}({cum.event})') +
scale_color_manual(values=c('Blue','lightskyblue','red','hotpink')) +
labs(y='Percentage Survival',
x='Time (days)',
title='Time to death for patients with or without AFB')
Reference
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.5.2 (2025-10-31)
os macOS Tahoe 26.3
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/London
date 2026-02-23
pandoc 3.6.3 @ /Applications/Positron.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
quarto 1.8.27 @ /Applications/Positron.app/Contents/Resources/app/quarto/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
P abind 1.4-8 2024-09-12 [?] RSPM
askpass 1.2.1 2024-10-04 [1] RSPM
P backports 1.5.0 2024-05-23 [?] RSPM
base64enc 0.1-6 2026-02-02 [1] RSPM
bit 4.6.0 2025-03-06 [1] RSPM
bit64 4.6.0-1 2025-01-16 [1] RSPM
blob 1.3.0 2026-01-14 [1] RSPM
boot 1.3-32 2025-08-29 [2] CRAN (R 4.5.2)
P broom * 1.0.12 2026-01-27 [?] RSPM
bslib 0.10.0 2026-01-26 [1] RSPM
cachem 1.1.0 2024-05-16 [1] RSPM
callr 3.7.6 2024-03-25 [1] RSPM
P car 3.1-5 2026-02-03 [?] RSPM
P carData 3.0-6 2026-01-30 [?] RSPM
cellranger 1.1.0 2016-07-27 [1] RSPM
P cli 3.6.5 2025-04-23 [?] RSPM
clipr 0.8.0 2022-02-22 [1] RSPM
colorspace 2.1-2 2025-09-22 [1] RSPM
commonmark 2.0.0 2025-07-07 [1] RSPM
conflicted 1.2.0 2023-02-01 [1] RSPM
corrplot 0.95 2024-10-14 [1] RSPM
cowplot 1.2.0 2025-07-07 [1] RSPM
cpp11 0.5.3 2026-01-20 [1] RSPM
crayon 1.5.3 2024-06-20 [1] RSPM
curl 7.0.0 2025-08-19 [1] RSPM
P data.table 1.18.2.1 2026-01-27 [?] RSPM
DBI 1.2.3 2024-06-02 [1] RSPM
dbplyr 2.5.2 2026-02-13 [1] RSPM
Deriv 4.2.0 2025-06-20 [1] RSPM
P digest 0.6.39 2025-11-19 [?] RSPM
doBy 4.7.1 2025-12-02 [1] RSPM
P dplyr * 1.2.0 2026-02-03 [?] RSPM
dtplyr 1.3.3 2026-02-11 [1] RSPM
P evaluate 1.0.5 2025-08-27 [?] RSPM
exactRankTests 0.8-35 2022-04-26 [1] RSPM
P farver 2.1.2 2024-05-13 [?] RSPM
P fastmap 1.2.0 2024-05-15 [?] RSPM
fontawesome 0.5.3 2024-11-16 [1] RSPM
P forcats * 1.0.1 2025-09-25 [?] RSPM
forecast 9.0.1 2026-02-14 [1] RSPM
P Formula 1.2-5 2023-02-24 [?] RSPM
fracdiff 1.5-3 2024-02-01 [1] RSPM
fs 1.6.6 2025-04-12 [1] RSPM
gargle 1.6.1 2026-01-29 [1] RSPM
P generics 0.1.4 2025-05-09 [?] RSPM
P ggplot2 * 4.0.2 2026-02-03 [?] RSPM
P ggpubr * 0.6.2 2025-10-17 [?] RSPM
ggrepel 0.9.6 2024-09-07 [1] RSPM
ggsci 4.2.0 2025-12-17 [1] RSPM
P ggsignif 0.6.4 2022-10-13 [?] RSPM
P ggsurvfit * 1.2.0 2025-09-13 [?] RSPM
ggtext 0.1.2 2022-09-16 [1] RSPM
P glue 1.8.0 2024-09-30 [?] RSPM
googledrive 2.1.2 2025-09-10 [1] RSPM
googlesheets4 1.1.2 2025-09-03 [1] RSPM
P gridExtra 2.3 2017-09-09 [?] RSPM
gridtext 0.1.5 2022-09-16 [1] RSPM
P gtable 0.3.6 2024-10-25 [?] RSPM
P haven * 2.5.5 2025-05-30 [?] RSPM
highr 0.11 2024-05-26 [1] RSPM
P hms 1.1.4 2025-10-17 [?] RSPM
P htmltools 0.5.9 2025-12-04 [?] RSPM
httr 1.4.8 2026-02-13 [1] RSPM
ids 1.0.1 2017-05-31 [1] RSPM
isoband 0.3.0 2025-12-07 [1] RSPM
jpeg 0.1-11 2025-03-21 [1] RSPM
jquerylib 0.1.4 2021-04-26 [1] RSPM
P jsonlite 2.0.0 2025-03-27 [?] RSPM
P km.ci 0.5-6 2022-04-06 [?] RSPM
P KMsurv 0.1-6 2025-05-20 [?] RSPM
P knitr * 1.51 2025-12-20 [?] RSPM
P labeling 0.4.3 2023-08-29 [?] RSPM
lattice 0.22-7 2025-04-02 [2] CRAN (R 4.5.2)
P lifecycle 1.0.5 2026-01-08 [?] RSPM
litedown 0.9 2025-12-18 [1] RSPM
lme4 1.1-38 2025-12-02 [1] RSPM
lmtest 0.9-40 2022-03-21 [1] RSPM
P lubridate * 1.9.5 2026-02-04 [?] RSPM
P magrittr 2.0.4 2025-09-12 [?] RSPM
markdown 2.0 2025-03-23 [1] RSPM
MASS 7.3-65 2025-02-28 [2] CRAN (R 4.5.2)
Matrix 1.7-4 2025-08-28 [2] CRAN (R 4.5.2)
MatrixModels 0.5-4 2025-03-26 [1] RSPM
maxstat 0.7-26 2025-05-02 [1] RSPM
memoise 2.0.1 2021-11-26 [1] RSPM
mgcv 1.9-3 2025-04-04 [2] CRAN (R 4.5.2)
microbenchmark 1.5.0 2024-09-04 [1] RSPM
mime 0.13 2025-03-17 [1] RSPM
minqa 1.2.8 2024-08-17 [1] RSPM
modelr 0.1.11 2023-03-22 [1] RSPM
mvtnorm 1.3-3 2025-01-10 [1] RSPM
nlme 3.1-168 2025-03-31 [2] CRAN (R 4.5.2)
nloptr 2.2.1 2025-03-17 [1] RSPM
nnet 7.3-20 2025-01-01 [2] CRAN (R 4.5.2)
numDeriv 2016.8-1.1 2019-06-06 [1] RSPM
openssl 2.3.4 2025-09-30 [1] RSPM
P patchwork 1.3.2 2025-08-25 [?] RSPM
pbkrtest 0.5.5 2025-07-18 [1] RSPM
P pillar 1.11.1 2025-09-17 [?] RSPM
P pkgconfig 2.0.3 2019-09-22 [?] RSPM
png 0.1-8 2022-11-29 [1] RSPM
polynom 1.4-1 2022-04-11 [1] RSPM
prettyunits 1.2.0 2023-09-24 [1] RSPM
processx 3.8.6 2025-02-21 [1] RSPM
progress 1.2.3 2023-12-06 [1] RSPM
ps 1.9.1 2025-04-12 [1] RSPM
P purrr * 1.2.1 2026-01-09 [?] RSPM
quantreg 6.1 2025-03-10 [1] RSPM
P R6 2.6.1 2025-02-15 [?] RSPM
ragg 1.5.0 2025-09-02 [1] RSPM
rappdirs 0.3.4 2026-01-17 [1] RSPM
rbibutils 2.4.1 2026-01-21 [1] RSPM
P RColorBrewer 1.1-3 2022-04-03 [?] RSPM
Rcpp 1.1.1 2026-01-10 [1] RSPM
RcppArmadillo 15.2.3-1 2025-12-17 [1] RSPM
RcppEigen 0.3.4.0.2 2024-08-24 [1] RSPM
Rdpack 2.6.6 2026-02-08 [1] RSPM
P readr * 2.1.6 2025-11-14 [?] RSPM
readxl 1.4.5 2025-03-07 [1] RSPM
reformulas 0.4.4 2026-02-02 [1] RSPM
rematch 2.0.0 2023-08-30 [1] RSPM
rematch2 2.1.2 2020-05-01 [1] RSPM
reprex 2.1.1 2024-07-06 [1] RSPM
P rlang 1.1.7 2026-01-09 [?] RSPM
P rmarkdown 2.30 2025-09-28 [?] RSPM
P rstatix 0.7.3 2025-10-18 [?] RSPM
rstudioapi 0.18.0 2026-01-16 [1] RSPM
rvest 1.0.5 2025-08-29 [1] RSPM
P S7 0.2.1 2025-11-14 [?] RSPM
sass 0.4.10 2025-04-11 [1] RSPM
P scales 1.4.0 2025-04-24 [?] RSPM
selectr 0.5-1 2025-12-17 [1] RSPM
SparseM 1.84-2 2024-07-17 [1] RSPM
P stringi 1.8.7 2025-03-27 [?] RSPM
P stringr * 1.6.0 2025-11-04 [?] RSPM
survival * 3.8-3 2024-12-17 [2] CRAN (R 4.5.2)
P survminer * 0.5.1 2025-09-02 [?] RSPM
P survMisc 0.5.6 2022-04-07 [?] RSPM
sys 3.4.3 2024-10-04 [1] RSPM
systemfonts 1.3.1 2025-10-01 [1] RSPM
textshaping 1.0.4 2025-10-10 [1] RSPM
P tibble * 3.3.1 2026-01-11 [?] RSPM
P tidyr * 1.3.2 2025-12-19 [?] RSPM
P tidyselect 1.2.1 2024-03-11 [?] RSPM
P tidyverse * 2.0.0 2023-02-22 [?] RSPM
P timechange 0.4.0 2026-01-29 [?] RSPM
timeDate 4052.112 2026-01-28 [1] RSPM
tinytex 0.58 2025-11-19 [1] RSPM
P tzdb 0.5.0 2025-03-15 [?] RSPM
urca 1.3-4 2024-05-27 [1] RSPM
P utf8 1.2.6 2025-06-08 [?] RSPM
uuid 1.2-2 2026-01-23 [1] RSPM
P vctrs 0.7.1 2026-01-23 [?] RSPM
viridisLite 0.4.3 2026-02-04 [1] RSPM
vroom 1.7.0 2026-01-27 [1] RSPM
P withr 3.0.2 2024-10-28 [?] RSPM
P xfun 0.56 2026-01-18 [?] RSPM
xml2 1.5.2 2026-01-17 [1] RSPM
P xtable 1.8-4 2019-04-21 [?] RSPM
P yaml 2.3.12 2025-12-10 [?] RSPM
P zoo 1.8-15 2025-12-15 [?] RSPM
[1] /Users/christinafillmore/Documents/GitHub/CAMIS/renv/library/macos/R-4.5/aarch64-apple-darwin20
[2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
* ── Packages attached to the search path.
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────