# Survival Analysis Using R

The most commonly used survival analysis methods in clinical trials include:

Kaplan-Meier (KM) estimators: non-parametric statistics utilized for estimating the survival function

Log-rank test: a non-parametric test for comparing the survival functions across two or more groups

Cox proportional hazards (PH) model: a semi-parametric model often used to assess the relationship between the survival time and explanatory variables

Additionally, other methods for analyzing time-to-event data are available, such as:

Parametric survival model

Accelerated failure time model

Competing risk model

Restricted mean survival time

Time-dependent Cox model

While these models may be explored in a separate document, this particular document focuses solely on the three most prevalent methods: KM estimators, log-rank test and Cox PH model.

# Analysis of Time-to-event Data

Below is a standard mock-up for survival analysis in clinical trials.

## Example Data

Data source: https://stats.idre.ucla.edu/sas/seminars/sas-survival/

The data include 500 subjects from the Worcester Heart Attack Study. This study examined several factors, such as age, gender and BMI, that may influence survival time after heart attack. Follow up time for all participants begins at the time of hospital admission after heart attack and ends with death or loss to follow up (censoring). The variables used here are:

lenfol: length of followup, terminated either by death or censoring - time variable

fstat: loss to followup = 0, death = 1 - censoring variable

afb: atrial fibrillation, no = 0, 1 = yes - explanatory variable

gender: males = 0, females = 1 - stratification factor

```
library(tidyverse)
library(haven)
library(survival)
library(survminer)
library(broom)
library(knitr)
::opts_chunk$set(echo = TRUE)
knitr
<- read_sas(file.path("../data/whas500.sas7bdat")) %>%
dat mutate(LENFOLY = round(LENFOL/365.25, 2), ## change follow-up days to years for better visualization
AFB = factor(AFB, levels = c(1, 0))) ## change AFB order to use "Yes" as the reference group to be consistent with SAS
```

## The Non-stratified Model

First we try a non-stratified analysis following the mock-up above to describe the association between survival time and afb (atrial fibrillation).

The KM estimators are from `survival::survfit`

function, the log-rank test uses `survminer::surv_pvalue`

, and Cox PH model is conducted using `survival::coxph`

function. Numerous R packages and functions are available for performing survival analysis. The author has selected `survival`

and `survminer`

for use in this context, but alternative options can also be employed for survival analysis.

### KM estimators

```
<- survfit(Surv(LENFOLY, FSTAT) ~ AFB, data = dat)
fit.km
## quantile estimates
quantile(fit.km, probs = c(0.25, 0.5, 0.75))
```

```
$quantile
25 50 75
AFB=1 0.26 2.37 6.43
AFB=0 0.94 5.91 6.44
$lower
25 50 75
AFB=1 0.05 1.27 4.24
AFB=0 0.55 4.32 6.44
$upper
25 50 75
AFB=1 1.11 4.24 NA
AFB=0 1.47 NA NA
```

```
## landmark estimates at 1, 3, 5-year
summary(fit.km, times = c(1, 3, 5))
```

```
Call: survfit(formula = Surv(LENFOLY, FSTAT) ~ AFB, data = dat)
AFB=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 50 28 0.641 0.0543 0.543 0.757
3 27 12 0.455 0.0599 0.351 0.589
5 11 6 0.315 0.0643 0.211 0.470
AFB=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 312 110 0.739 0.0214 0.699 0.782
3 199 33 0.642 0.0245 0.595 0.691
5 77 20 0.530 0.0311 0.472 0.595
```

### Log-rank test

`::surv_pvalue(fit.km, data = dat) survminer`

```
variable pval method pval.txt
1 AFB 0.0009646027 Log-rank p = 0.00096
```

### Cox PH model

```
<- coxph(Surv(LENFOLY, FSTAT) ~ AFB, data = dat)
fit.cox %>%
fit.cox tidy(exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) %>%
select(term, estimate, conf.low, conf.high)
```

```
# A tibble: 1 × 4
term estimate conf.low conf.high
<chr> <dbl> <dbl> <dbl>
1 AFB0 0.583 0.421 0.806
```

## The Stratified Model

In a stratified model, the Kaplan-Meier estimators remain the same as those in the non-stratified model. To implement stratified log-rank tests and Cox proportional hazards models, simply include the strata() function within the model formula.

### Stratified Log-rank test

```
<- survfit(Surv(LENFOLY, FSTAT) ~ AFB + strata(GENDER), data = dat)
fit.km.str
::surv_pvalue(fit.km.str, data = dat) survminer
```

```
variable pval method pval.txt
1 AFB+strata(GENDER) 0.001506607 Log-rank p = 0.0015
```

### Stratified Cox PH model

```
<- coxph(Surv(LENFOLY, FSTAT) ~ AFB + strata(GENDER), data = dat)
fit.cox.str %>%
fit.cox.str tidy(exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) %>%
select(term, estimate, conf.low, conf.high)
```

```
# A tibble: 1 × 4
term estimate conf.low conf.high
<chr> <dbl> <dbl> <dbl>
1 AFB0 0.594 0.430 0.823
```