R Recurrent Events

Recurrent event models

Setup

General libraries

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(gt)

Methodology specific libraries

library(survival)

Modelling recurrent events

Methodology introduction

Traditionally, survival analysis focuses on the time to a single first event. While there are many applications for such time-to-event analysis in clinical trials, this approach falls short when events of interest can occur multiple times within the same subject. Recurrent event models extend the traditional Cox proportional hazards framework to account for multiple events per subject (Ozga et al. 2018, Amorim & Cai 2015).

In this tutorial, we will demonstrate how to implement different recurrent event models in R, specifically the Andersen-Gill, proportional means/rates, Prentice-Williams-Peterson, and Wei-Lin-Weissfeld models, using the well-known survival package. The R code follows the layout of Amor 2023, with additional insights taken from Lu et al. 2018.

Recurrent event models can roughly be divided in three categories: counting process models, conditional models and marginal models. In the section below, we will explore the difference between each of these approaches. In addition, important aspects of data structure will be discussed by means of two fictional subjects, one with 4 events and 0 censored observations (events at time 6, 9, 56 and 88), and another with 2 events and 1 censored observation (events at time 42, 57, and censored at time 91).

Define the following:

Note

\[*\]

\(\lambda_i(t)\): hazard function for the \(i\)th subject at time \(t\)

\(\lambda_{ij}(t)\): hazard function for the \(j\)th event of the \(i\)th subject at time \(t\)

\(\lambda_0(t)\): common baseline hazard for all events

\(\lambda_{0j}(t)\): event-specific baseline hazard for the \(j\)th event at time \(t\)

\(\beta\): common parameter vector

\(\beta_j\): event-specific parameter vector for the \(j\)th event

\(X_{ij}\): covariate vector for the \(j\)th event of the \(i\)th subject

Counting process models

Andersen-Gill model (Andersen & Gill 1982)

\[ \lambda_i(t) = \lambda_0(t) \exp \left( \beta X_{ij}(t) \right) \ * \]

  • Counting process approach: treats each subject as a multiple events counting process

  • Common baseline hazard \(\lambda_0(t)\)

  • Common regression coefficients \(\beta\)

  • Unrestricted risk set: a subject contributes to the risk set for an event as long as the subject is under observation, i.e. it can be at risk for a subsequent event even though the previous event did not yet occur

  • Order of events is not important

An essential assumption of the Andersen-Gill model is that of independent events within subjects. This, however, is often not realistic in clinical trial data. For example, let’s say that we are modelling myocardial infarction (MI). If a patient has already experienced one MI, their risk of subsequent events may increase due to underlying cardiovascular damage or presence of other risk factors. Thus, the more events a patient has, the more likely they are to experience future events, indicating dependence rather than independence. To accurately model this within-subject correlation, extensions like time-varying covariates, a robust sandwich covariance estimator or frailty terms may be needed. In this tutorial, we will discuss the sandwich correction.

Lin-Wei-Yang-Ying (LWYY) model or proportional means/rates model (Lin, Wei, Yang & Ying 2000)

Lin, Wei, Yang, and Ying introduced an improved version of the Andersen-Gill model in 2000 (often referred to as proportional means/rates model), featuring a robust sandwich estimator that explicitly accounts for individual subject clusters. These robust standard errors yield wider confidence intervals and provide asymptotically valid inference even when the independence assumption does not hold (Lee et al. 2025). The original and improved Andersen-Gill model often appear interchangeable in the literature, and while they produce identical estimates, their robust standard errors can differ substantially, which may impact the conclusions drawn from statistical inference.

For both versions of the Andersen-Gill model, the data must be structured as follows:

Subject Time interval Event Stratum
1 (0, 6] 1 1
1 (6, 9] 1 1
1 (9, 56] 1 1
1 (56, 88] 1 1
2 (0, 42] 1 1
2 (42, 87] 1 1
2 (87, 91] 0 1

This can be visually represented:

In both versions of the Andersen-Gill model, each new time interval starts where the previous one ends.

Conditional models

Prentice-Williams-Peterson model (Prentice, Williams & Peterson 1981)

  • Conditional approach: incorporates conditional strata to account for ordering/dependence of events

  • Stratified baseline hazard \(\lambda_{0j}(t)\)

  • Stratified regression coefficients \(\beta_j\): can be pooled (\(\beta\)) or kept as event-specific (\(\beta_j\)) in the output

  • Restricted risk set: contributions to the risk set for a subsequent event are restricted to only consider subjects that already experienced the previous event

  • Order of events is important

The Prentice-Williams-Peterson model can incorporate both overall and event-specific effects \(\beta_j\) for each covariate. An often made assumption is to set \(\beta_1 = \beta_2 = ... = \beta_k = \beta\) to estimate a common parameter \(\beta\).

Depending on the outcome of interest, Prentice, Williams and Peterson suggested two distinct models:

  1. Total time model

\[ \lambda_{ij}(t) = \lambda_{0j}(t) \exp \left( \beta_j X_{ij}(t) \right) \ * \]

The total time variant of the Prentice-Williams-Peterson model uses the same time intervals as the counting process approach (Andersen-Gill model), which is useful for modelling the full time course (\(t\)) of the recurrent event process, i.e. the hazard of any recurrence.

For the total time model, the data must be structured as follows:

Subject Time interval Event Stratum
1 (0, 6] 1 1
1 (6, 9] 1 2
1 (9, 56] 1 3
1 (56, 88] 1 4
2 (0, 42] 1 1
2 (42, 87] 1 2
2 (87, 91] 0 3

This can be visually represented:

Again, in the total time model, each new time intervals starts where the previous one ends.

  1. Gap time model

\[ \lambda_{ij}(t) = \lambda_{0j}(t - t_{j-1}) \exp \left( \beta_j X_{ij}(t) \right) \ * \]

The gap time variant of the Prentice-Williams-Peterson model uses time intervals that start at zero and end at the length of time until the next event, which is useful for modelling the time between each of the recurring events (\(t - t_{j-1}\)), i.e. the hazard of recurrence after the previous event.

For the gap time model, the data must be structured as follows:

Subject Time interval Event Stratum
1 (0, 6] 1 1
1 (0, 3] 1 2
1 (0, 47] 1 3
1 (0, 32] 1 4
2 (0, 42] 1 1
2 (0, 45] 1 2
2 (0, 3] 0 3

This can be visually represented:

In the gap time model, each time interval starts at zero and has a length equal to the gap time between two neighboring events.

Marginal models

Wei-Lin-Weissfeld model (Wei, Lin & Weissfeld 1989)

\[ \lambda_{ij}(t) = \lambda_{0j}(t) \exp \left( \beta_j X_{ij}(t) \right) \ * \]

  • Marginal approach: treats each (recurrent) event as having a separate, marginal process

  • Stratified baseline hazard \(\lambda_{0j}(t)\)

  • Stratified regression coefficients \(\beta_j\): can be pooled (\(\beta\)) or kept as event-specific (\(\beta_j\)) in the output

  • Semi-restricted risk set: all subjects contribute follow-up times to all potential events, i.e. each subject is at risk for all potential events, regardless of how many events that subject actually experiences

  • Order of events is not important

Although the Wei-Lin-Weissfeld model has it roots in competing risks analysis, it conveniently lends itself to model recurrent events as well. Like the Andersen-Gill model, the Wei-Lin-Weissfeld model also assumes independence of events, which is often not feasible in practice. In addition, it is assumed there is no specific order among the events or that the events are different types of events, and not necessarily recurrent events.

Like the Prentice-Williams-Peterson models, the Wei-Lin-Weissfeld model can incorporate both overall and event-specific effects \(\beta_j\) for each covariate. An often made assumption is to set \(\beta_1 = \beta_2 = ... = \beta_k = \beta\) to estimate a common parameter \(\beta\). Another approach is to combine event-specific effects \(\beta_j\) to get an estimator of the average effect, as described in Wei, Lin & Weissfeld 1989 (this is not discussed further here).

For Wei-Lin-Weissfeld models, the data must be structured as follows:

Subject Time interval Event Stratum
1 (0, 6] 1 1
1 (0, 9] 1 2
1 (0, 56] 1 3
1 (0, 88] 1 4
2 (0, 42] 1 1
2 (0, 87] 1 2
2 (0, 91] 0 3
2 (0, 91] 0 4

This can be visually represented:

In the Wei-Lin-Weissfeld model, each time intervals starts at zero and ends at its respective event time.

Overview of all models

In summary, the selection of the model to use would depend on the type of events, the importance of the order of the events and the time intervals to be analyzed. We made an effort to summarize the similarities and differences between the models in the table below.

AG PWPtt PWPgt WLW
Approach counting process conditional conditional marginal
Baseline hazard common stratified stratified stratified
Regression coefficients common stratified possible stratified possible stratified possible
Risk set unrestricted restricted restricted semi-restricted
Time interval total time total time gap time total time
Order of events not important important important not important
Hazard ratio (HR) risk of any recurrence risk of any recurrence risk of recurrence after previous event risk of event of any type, not necessarily recurrent event

Note that, because the ordering of events is not important in the Andersen-Gill and Wei-Lin-Weissfeld model, these models come with the assumption of independence of events. In contrast, the Prentice-Williams-Peterson models overcome the need for this assumption by capturing the dependence structure between recurrent events in conditional strata. Consequently, events are assumed to be conditionally independent in the Prentice-Williams-Peterson models.

A nice visual representation of the stratification and time interval structure of each model is given below. The correct data structure is pivotal when modelling recurrent events and depends on the methodology you want to use, as illustrated in the figure.

Modelling recurrent events using the survival package

Data

For this tutorial we will use the bladder data from the survival package, which captures recurrences of bladder cancer from a clinical trial for an oncolytic called thiotepa. The bladder data is regularly used by many statisticians to demonstrate methodology for recurrent event modelling. Somewhat confusingly, there are three versions of this data available:

  • bladder1: original data from the study on all subjects (294 records)

  • bladder2: data in Andersen-Gill format on subset of subjects with nonzero follow-up time (178 records)

  • bladder: data in Wei-Lin-Weissfeld format on subset of subjects with nonzero follow-up time (340 records)

For this tutorial, we will use bladder2 to illustrate Andersen-Gill and Prentice-Williams-Peterson models, and bladder to illustrate the Wei-Lin-Weissfeld model.

The variables included in both datasets are:

  • id: patient id

  • rx: treatment group (1 = placebo, 2 = thiotepa)

  • number: initial number of tumors (8 = 8 or more)

  • size: size in cm of largest initial tumor

  • start: start of time interval; this variable is not present in bladder

  • stop: (recurrent) event or censoring time

  • event: event indicator (1 = event, 0 = censored)

  • enum: order of recurrence

Importantly, both datasets collect the data in a counting process structure. This means that there is one record for each subject and time interval, where a time interval is defined as the time to its respective event (event = 1), or the time to follow-up if the event did not occur (event = 0).

Let’s look more closely at the bladder2 and bladder data:

bladder2 <- survival::bladder2
gt(head(bladder2, 6))
id rx number size start stop event enum
1 1 1 3 0 1 0 1
2 1 2 1 0 4 0 1
3 1 1 1 0 7 0 1
4 1 5 1 0 10 0 1
5 1 4 1 0 6 1 1
5 1 4 1 6 10 0 2
bladder2 %>%
  group_by(enum) %>% summarise(n = n()) %>% gt()
enum n
1 85
2 46
3 27
4 20

In bladder2, in the Andersen-Gill format, each subject has a variable amount of records, depending on the amount of events that subject experienced.

bladder <- survival::bladder
gt(head(bladder, 20))
id rx number size stop event enum
1 1 1 3 1 0 1
1 1 1 3 1 0 2
1 1 1 3 1 0 3
1 1 1 3 1 0 4
2 1 2 1 4 0 1
2 1 2 1 4 0 2
2 1 2 1 4 0 3
2 1 2 1 4 0 4
3 1 1 1 7 0 1
3 1 1 1 7 0 2
3 1 1 1 7 0 3
3 1 1 1 7 0 4
4 1 5 1 10 0 1
4 1 5 1 10 0 2
4 1 5 1 10 0 3
4 1 5 1 10 0 4
5 1 4 1 6 1 1
5 1 4 1 10 0 2
5 1 4 1 10 0 3
5 1 4 1 10 0 4
bladder %>%
  group_by(enum) %>% summarise(n = n()) %>% gt()
enum n
1 85
2 85
3 85
4 85

In bladder, in the Wei-Lin-Weissfeld format, each subject has four records, regardless of how many events that subject actually experienced. In addition, there is no start variable, as all time intervals start at zero.

Analysis

In the survival package, any survival analysis based on the Cox proportional hazard model can be conducted using the coxph() function. Hence, conveniently, when modelling time-to-event data with recurrent events, the same function can be used. The caveat here is that an adequate data structure is required, which must be in correspondence with the model you want to use.

In this section of the tutorial, we will explain how the arguments of the coxph() function and data structure must be defined to fit every type of recurrent event model correctly.

Andersen-Gill model

  1. Improved Andersen-Gill model (LWYY model or proportional means/rates model)

For the improved version of the Andersen-Gill model you must specify:

  • formula = Surv(start, stop, event) ~ 'predictors'

  • cluster = id

And the data structure must be:

Id Time interval Start Stop Event
1 (0, 1] 0 1 0
2 (0, 4] 0 4 0
3 (0, 7] 0 7 0
4 (0, 10] 0 10 0
5 (0, 6] 0 6 1
5 (6, 10] 6 10 0

We will use the bladder2 data for this.

AG <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size,
            ties = "breslow",
            cluster = id,
            data = bladder2)
summary(AG)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number + 
    size, data = bladder2, ties = "breslow", cluster = id)

  n= 178, number of events= 112 

                   coef exp(coef) se(coef) robust se      z Pr(>|z|)   
as.factor(rx)2 -0.45979   0.63142  0.19996   0.25801 -1.782  0.07474 . 
number          0.17164   1.18726  0.04733   0.06131  2.799  0.00512 **
size           -0.04256   0.95833  0.06903   0.07555 -0.563  0.57317   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2    0.6314     1.5837    0.3808     1.047
number            1.1873     0.8423    1.0528     1.339
size              0.9583     1.0435    0.8264     1.111

Concordance= 0.634  (se = 0.032 )
Likelihood ratio test= 16.77  on 3 df,   p=8e-04
Wald test            = 11.76  on 3 df,   p=0.008
Score (logrank) test = 18.57  on 3 df,   p=3e-04,   Robust = 11.44  p=0.01

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

By defining the cluster argument, coxph() will automatically set robust = TRUE, and compute a robust sandwich covariance. The summary function will then display both the non-robust (se(coef)) and robust (robust se) standard error estimates. Under the hood, the robust standard errors will consider all id clusters separately and ultimately sum up the score residuals for each distinct cluster.

  1. Original Andersen-Gill model

To our knowledge, the original Andersen-Gill model of 1989 can only be fitted in R by adding an artificial clustering variable with unique entries to the bladder2 data, which we call id2. This artificial clustering variable will ignore any clustering that is actually present in the data.

bladder2 <- bladder2 %>%
  dplyr::mutate(id2 = row_number())

Except for cluster = id2, the rest of the code remains the same.

AG_original <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size,
            ties = "breslow",
            cluster = id2,
            data = bladder2)
summary(AG_original)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number + 
    size, data = bladder2, ties = "breslow", cluster = id2)

  n= 178, number of events= 112 

                   coef exp(coef) se(coef) robust se      z Pr(>|z|)   
as.factor(rx)2 -0.45979   0.63142  0.19996   0.22902 -2.008  0.04468 * 
number          0.17164   1.18726  0.04733   0.06469  2.653  0.00797 **
size           -0.04256   0.95833  0.06903   0.07325 -0.581  0.56119   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2    0.6314     1.5837    0.4031    0.9891
number            1.1873     0.8423    1.0459    1.3477
size              0.9583     1.0435    0.8302    1.1063

Concordance= 0.634  (se = 0.032 )
Likelihood ratio test= 16.77  on 3 df,   p=8e-04
Wald test            = 9.36  on 3 df,   p=0.02
Score (logrank) test = 18.57  on 3 df,   p=3e-04,   Robust = 14.27  p=0.003

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

Although the original Andersen-Gill model does not consider separate id clusters, it still computes robust standard errors using the sandwich estimator, as robust = TRUE. The resulting robust standard errors (robust se) differ from those provided for the improved Andersen-Gill model, while the estimated coefficients (coef) and non-robust standard errors (se(coef)) remain perfectly unchanged.

Prentice-Williams-Peterson model

  1. Total time model

For the Prentice-Williams-Peterson total time model you must specify:

  • formula = Surv(start, stop, event) ~ 'predictors' + strata(enum)

  • cluster = id

And the data structure must be:

Id Time interval Start Stop Event Enum
1 (0, 1] 0 1 0 1
2 (0, 4] 0 4 0 1
3 (0, 7] 0 7 0 1
4 (0, 10] 0 10 0 1
5 (0, 6] 0 6 1 1
5 (6, 10] 6 10 0 2

We will use the bladder2 data for this.

PWPtt <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size + strata(enum),
            ties = "breslow",
            cluster = id,
            data = bladder2)
summary(PWPtt)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number + 
    size + strata(enum), data = bladder2, ties = "breslow", cluster = id)

  n= 178, number of events= 112 

                    coef exp(coef)  se(coef) robust se      z Pr(>|z|)  
as.factor(rx)2 -0.334295  0.715842  0.216087  0.197064 -1.696   0.0898 .
number          0.115653  1.122606  0.053681  0.049913  2.317   0.0205 *
size           -0.008051  0.991982  0.072725  0.060124 -0.134   0.8935  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2    0.7158     1.3970    0.4865     1.053
number            1.1226     0.8908    1.0180     1.238
size              0.9920     1.0081    0.8817     1.116

Concordance= 0.615  (se = 0.032 )
Likelihood ratio test= 6.11  on 3 df,   p=0.1
Wald test            = 7.19  on 3 df,   p=0.07
Score (logrank) test = 6.45  on 3 df,   p=0.09,   Robust = 8.76  p=0.03

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

The conditional strata of the Prentice-Williams-Peterson model are set by strata(enum) in the formula, where enum captures the ordering of recurrent events.

  1. Gap time model

For the Prentice-Williams-Peterson gap time model you must specify:

  • formula = Surv(gtime, event) ~ 'predictors' + strata(enum)

  • cluster = id

And the data structure must be:

Id Time interval Start Stop Event Enum
1 (0, 1] 0 1 0 1
2 (0, 4] 0 4 0 1
3 (0, 7] 0 7 0 1
4 (0, 10] 0 10 0 1
5 (0, 6] 0 6 1 1
5 (0, 4] 0 4 0 2

This data structure can be achieved in bladder2 by adding a gtime variable.

bladder2 <- bladder2 %>%
  dplyr::mutate(gtime = stop - start)
  
gt(head(bladder2, 6))
id rx number size start stop event enum id2 gtime
1 1 1 3 0 1 0 1 1 1
2 1 2 1 0 4 0 1 2 4
3 1 1 1 0 7 0 1 3 7
4 1 5 1 0 10 0 1 4 10
5 1 4 1 0 6 1 1 5 6
5 1 4 1 6 10 0 2 6 4

We artificially set start = 0 for each gap time interval by including gtime instead of start, stop in the Surv() object.

PWPgt <- coxph(Surv(gtime, event) ~ as.factor(rx) + number + size + strata(enum),
            ties = "breslow",
            cluster = id,
            data = bladder2)
summary(PWPgt)
Call:
coxph(formula = Surv(gtime, event) ~ as.factor(rx) + number + 
    size + strata(enum), data = bladder2, ties = "breslow", cluster = id)

  n= 178, number of events= 112 

                   coef exp(coef) se(coef) robust se      z Pr(>|z|)   
as.factor(rx)2 -0.26952   0.76375  0.20766   0.20808 -1.295  0.19522   
number          0.15353   1.16595  0.05211   0.04889  3.140  0.00169 **
size            0.00684   1.00686  0.07001   0.06222  0.110  0.91246   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2    0.7637     1.3093    0.5080     1.148
number            1.1659     0.8577    1.0594     1.283
size              1.0069     0.9932    0.8913     1.137

Concordance= 0.596  (se = 0.032 )
Likelihood ratio test= 8.76  on 3 df,   p=0.03
Wald test            = 12.14  on 3 df,   p=0.007
Score (logrank) test = 9.6  on 3 df,   p=0.02,   Robust = 10.21  p=0.02

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

Wei-Lin-Weissfeld model

For the Wei-Lin-Weissfeld model you must specify:

  • formula = Surv(stop, event) ~ 'predictors' + strata(enum)

  • cluster = id

And the data structure must be:

Id Time interval Start Stop Event Enum
1 (0, 1] 0 1 0 1
1 (0, 1] 0 1 0 2
1 (0, 1] 0 1 0 3
1 (0, 1] 0 1 0 4
2 (0, 4] 0 4 0 1
2 (0, 4] 0 4 0 2
2 (0, 4] 0 4 0 3
2 (0, 4] 0 4 0 4
3 (0, 7] 0 7 0 1
3 (0, 7] 0 7 0 2
3 (0, 7] 0 7 0 3
3 (0, 7] 0 7 0 4
4 (0, 10] 0 10 0 1
4 (0, 10] 0 10 0 2
4 (0, 10] 0 10 0 3
4 (0, 10] 0 10 0 4
5 (0, 6] 0 6 1 1
5 (0, 10] 0 10 0 2
5 (0, 10] 0 10 0 3
5 (0, 10] 0 10 0 4

We will use the bladder data for this.

WLW <- coxph(Surv(stop, event) ~ as.factor(rx) + number + size + strata(enum),
            ties = "breslow",
            cluster = id,
            data = bladder)
summary(WLW)
Call:
coxph(formula = Surv(stop, event) ~ as.factor(rx) + number + 
    size + strata(enum), data = bladder, ties = "breslow", cluster = id)

  n= 340, number of events= 112 

                   coef exp(coef) se(coef) robust se      z Pr(>|z|)   
as.factor(rx)2 -0.57986   0.55998  0.20118   0.30344 -1.911   0.0560 . 
number          0.20849   1.23182  0.04691   0.06567  3.175   0.0015 **
size           -0.05094   0.95034  0.06967   0.09304 -0.548   0.5840   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2    0.5600     1.7858    0.3089     1.015
number            1.2318     0.8118    1.0830     1.401
size              0.9503     1.0523    0.7919     1.140

Concordance= 0.663  (se = 0.036 )
Likelihood ratio test= 24.71  on 3 df,   p=2e-05
Wald test            = 15.56  on 3 df,   p=0.001
Score (logrank) test = 27.89  on 3 df,   p=4e-06,   Robust = 11.75  p=0.008

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

Importantly, the strata of the Wei-Lin-Weissfeld model as set by strata(enum) are substantially different from the conditional strata of the Prentice-Williams-Peterson model. The enum variable is now no longer assumed to be an ordinal variable.

Notes

Note: For all recurrent event models, another way of defining the subject clusters is by using cluster(id) in the formula, rather than setting cluster = id. This results in the same outcomes, as shown below.

AG_v1 <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size,
            ties = "breslow",
            cluster = id,
            data = bladder2)

summary(AG_v1)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number + 
    size, data = bladder2, ties = "breslow", cluster = id)

  n= 178, number of events= 112 

                   coef exp(coef) se(coef) robust se      z Pr(>|z|)   
as.factor(rx)2 -0.45979   0.63142  0.19996   0.25801 -1.782  0.07474 . 
number          0.17164   1.18726  0.04733   0.06131  2.799  0.00512 **
size           -0.04256   0.95833  0.06903   0.07555 -0.563  0.57317   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2    0.6314     1.5837    0.3808     1.047
number            1.1873     0.8423    1.0528     1.339
size              0.9583     1.0435    0.8264     1.111

Concordance= 0.634  (se = 0.032 )
Likelihood ratio test= 16.77  on 3 df,   p=8e-04
Wald test            = 11.76  on 3 df,   p=0.008
Score (logrank) test = 18.57  on 3 df,   p=3e-04,   Robust = 11.44  p=0.01

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
AG_v2 <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size + cluster(id),
            ties = "breslow",
            data = bladder2)

summary(AG_v2)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number + 
    size, data = bladder2, ties = "breslow", cluster = id)

  n= 178, number of events= 112 

                   coef exp(coef) se(coef) robust se      z Pr(>|z|)   
as.factor(rx)2 -0.45979   0.63142  0.19996   0.25801 -1.782  0.07474 . 
number          0.17164   1.18726  0.04733   0.06131  2.799  0.00512 **
size           -0.04256   0.95833  0.06903   0.07555 -0.563  0.57317   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2    0.6314     1.5837    0.3808     1.047
number            1.1873     0.8423    1.0528     1.339
size              0.9583     1.0435    0.8264     1.111

Concordance= 0.634  (se = 0.032 )
Likelihood ratio test= 16.77  on 3 df,   p=8e-04
Wald test            = 11.76  on 3 df,   p=0.008
Score (logrank) test = 18.57  on 3 df,   p=3e-04,   Robust = 11.44  p=0.01

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

Note: R uses ties = "efron" by default, while SAS uses ties = breslow by default. In this tutorial, we forced R to use ties = "breslow" to match SAS for all recurrent event models. For more information, be sure to check the CAMIS webpage on the comparison of Cox proportional hazards models in R and SAS.

Interpretation

In terms of interpretation, hazard ratios (\(\exp(\beta_j)\)) are often used when making inferences based on Cox proportional hazards models. Now, as you may remember from the overview presented earlier, it is important to recognize that each of the recurrent event models comes with a slightly different interpretation of the hazard ratio, as defined by the assumptions around the model.

AG PWPtt PWPgt WLW
Hazard ratio (HR) risk of any recurrence risk of any recurrence risk of recurrence after previous event risk of event of any type, not necessarily recurrent event

This means that, for the bladder data, we can draw slightly different conclusions on the hazard ratio of the group treated with thiotepa (rx = 2) versus the placebo group (rx = 1).

Model HR: rx2 vs rx1 95% CI P-value
AG 0.631 0.381 to 1.047 0.0747
Original AG 0.631 0.403 to 0.989 0.0447
PWPtt 0.716 0.487 to 1.053 0.0898
PWPgt 0.764 0.508 to 1.148 0.1952
WLW 0.560 0.309 to 1.015 0.0560

These conclusions are:

  • Andersen-Gill: the risk of having any new tumor recurrence in the treatment group is 0.631 (0.381 - 1.047) times that of the placebo group

  • Prentice-Williams-Peterson: total time: the risk of having any new tumor recurrence in the treatment group is 0.716 (0.487 - 1.053) times that of the placebo group

  • Prentice-Williams-Peterson: gap time: the risk of having a new tumor recurrence after a previous event in the treatment group is 0.764 (0.508 - 1.148) times that of the placebo group

  • Wei-Lin-Weissfeld: the risk of having any type of event in the treatment group is 0.560 (0.309 - 1.015) times that of the placebo group

Note: The improved Andersen-Gill model (LWYY model or proportional means/rates model) is preferred over the original Andersen-Gill model.

Event-specific estimates

For the Prentice-Williams-Peterson and Wei-Lin-Weissfeld models we can incorporate both overall (\(\beta\)) and event-specific (\(\beta_j\)) effects for each covariate. To arrive at pooled model parameters these models assume that \(\beta_1 = \beta_2 = ... = \beta_k = \beta\). Until now, we have only considered pooled model parameters, but given the underlying stratification of these two models in particular, it may be valuable to look into the event-specific estimates as well (Amorim & Cai 2015).

To output event-specific estimates for the treatment effect (rx), we simply specify rx:strata(enum) in the formula.

Prentice-Williams-Peterson model

Total time model

PWPtt_stratified <- coxph(Surv(start, stop, event) ~ rx:strata(enum) + number + size,
                          ties = "breslow", cluster = id, data = bladder2)
summary(PWPtt_stratified)
Call:
coxph(formula = Surv(start, stop, event) ~ rx:strata(enum) + 
    number + size, data = bladder2, ties = "breslow", cluster = id)

  n= 178, number of events= 112 

                           coef exp(coef)  se(coef) robust se      z Pr(>|z|)  
number                 0.111071  1.117475  0.054397  0.050063  2.219   0.0265 *
size                  -0.006674  0.993349  0.073173  0.061409 -0.109   0.9135  
rx:strata(enum)enum=1 -0.409893  0.663721  0.304293  0.286737 -1.430   0.1529  
rx:strata(enum)enum=2 -0.416339  0.659457  0.398266  0.423632 -0.983   0.3257  
rx:strata(enum)enum=3 -0.142970  0.866780  0.593818  0.405303 -0.353   0.7243  
rx:strata(enum)enum=4  0.105362  1.111112  0.679095  0.469706  0.224   0.8225  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                      exp(coef) exp(-coef) lower .95 upper .95
number                   1.1175     0.8949    1.0130     1.233
size                     0.9933     1.0067    0.8807     1.120
rx:strata(enum)enum=1    0.6637     1.5067    0.3784     1.164
rx:strata(enum)enum=2    0.6595     1.5164    0.2875     1.513
rx:strata(enum)enum=3    0.8668     1.1537    0.3917     1.918
rx:strata(enum)enum=4    1.1111     0.9000    0.4425     2.790

Concordance= 0.609  (se = 0.033 )
Likelihood ratio test= 6.73  on 6 df,   p=0.3
Wald test            = 8.22  on 6 df,   p=0.2
Score (logrank) test = 7.02  on 6 df,   p=0.3,   Robust = 9.37  p=0.2

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

Gap time model

PWPgt_stratified <- coxph(Surv(gtime, event) ~ rx:strata(enum) + number + size,

ties = “breslow”, cluster = id, data = bladder2)

summary(PWPgt_stratified)

PWPgt_stratified <- coxph(Surv(gtime, event) ~ rx:strata(enum) + number + size,
                          ties = "breslow", cluster = id, data = bladder2)
summary(PWPgt_stratified)
Call:
coxph(formula = Surv(gtime, event) ~ rx:strata(enum) + number + 
    size, data = bladder2, ties = "breslow", cluster = id)

  n= 178, number of events= 112 

                          coef exp(coef) se(coef) robust se      z Pr(>|z|)   
number                 0.15193   1.16408  0.05259   0.04864  3.124  0.00179 **
size                   0.01319   1.01328  0.07021   0.06297  0.209  0.83406   
rx:strata(enum)enum=1 -0.43665   0.64620  0.30521   0.28376 -1.539  0.12385   
rx:strata(enum)enum=2 -0.30182   0.73947  0.39752   0.38907 -0.776  0.43789   
rx:strata(enum)enum=3  0.01485   1.01497  0.47722   0.49843  0.030  0.97622   
rx:strata(enum)enum=4  0.06019   1.06204  0.58289   0.53982  0.112  0.91122   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                      exp(coef) exp(-coef) lower .95 upper .95
number                   1.1641     0.8590    1.0582     1.281
size                     1.0133     0.9869    0.8956     1.146
rx:strata(enum)enum=1    0.6462     1.5475    0.3705     1.127
rx:strata(enum)enum=2    0.7395     1.3523    0.3449     1.585
rx:strata(enum)enum=3    1.0150     0.9853    0.3821     2.696
rx:strata(enum)enum=4    1.0620     0.9416    0.3687     3.059

Concordance= 0.603  (se = 0.032 )
Likelihood ratio test= 9.73  on 6 df,   p=0.1
Wald test            = 14.45  on 6 df,   p=0.02
Score (logrank) test = 10.49  on 6 df,   p=0.1,   Robust = 11.41  p=0.08

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

Wei-Lin-Weissfeld model

WLW_stratified <- coxph(Surv(stop, event) ~ rx:strata(enum) + number + size,
                        ties = "breslow", cluster = id, data = bladder)
summary(WLW_stratified)
Call:
coxph(formula = Surv(stop, event) ~ rx:strata(enum) + number + 
    size, data = bladder, ties = "breslow", cluster = id)

  n= 340, number of events= 112 

                          coef exp(coef) se(coef) robust se      z Pr(>|z|)   
number                 0.20747   1.23056  0.04685   0.06573  3.156   0.0016 **
size                  -0.05187   0.94945  0.06966   0.09300 -0.558   0.5770   
rx:strata(enum)enum=1 -0.47890   0.61947  0.30583   0.28313 -1.691   0.0908 . 
rx:strata(enum)enum=2 -0.64914   0.52249  0.39217   0.36826 -1.763   0.0779 . 
rx:strata(enum)enum=3 -0.71783   0.48781  0.45971   0.42148 -1.703   0.0885 . 
rx:strata(enum)enum=4 -0.56175   0.57021  0.56143   0.49592 -1.133   0.2573   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                      exp(coef) exp(-coef) lower .95 upper .95
number                   1.2306     0.8126    1.0818     1.400
size                     0.9494     1.0532    0.7912     1.139
rx:strata(enum)enum=1    0.6195     1.6143    0.3556     1.079
rx:strata(enum)enum=2    0.5225     1.9139    0.2539     1.075
rx:strata(enum)enum=3    0.4878     2.0500    0.2135     1.114
rx:strata(enum)enum=4    0.5702     1.7537    0.2157     1.507

Concordance= 0.661  (se = 0.036 )
Likelihood ratio test= 24.95  on 6 df,   p=3e-04
Wald test            = 16.54  on 6 df,   p=0.01
Score (logrank) test = 28.19  on 6 df,   p=9e-05,   Robust = 11.92  p=0.06

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).

References

Amor 2023. Eat, Sleep, R, Repeat.

Amorim & Cai 2015. Modelling recurrent events: a tutorial for analysis in epidemiology. International Journal of Epidemiology. 2015 Feb;44(1):324-33.

Andersen & Gill 1982. Cox’s Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics. 10(4):1100–1120.

bladder data

Lee et al. 2025. Comparison of total event analysis and first event analysis in relation to heterogeneity in cardiovascular trials. BMC Medical Research Methodology. 2025 Jun 9;25(1):159.

Lin, Wei, Yang & Ying 2000. Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B. 62(4):711–730.

Lu & Shen 2018. Application of Survival Analysis in Multiple Events Using SAS. PharmaSUG 2018.

Ozga et al. 2018. A systematic comparison of recurrent event models for application to composite endpoints. BMC Medical Research Methodology. 2018 Jan 4;18(1):2.

Prentice, Williams & Peterson 1981. On the Regression Analysis of Multivariate Failure Time Data. Biometrika. 68(2):373–379.

survival package

Wei, Lin & Weissfeld 1989. Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions. Journal of the American Statistical Association. 84(408):1065–1073.

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       macOS Sequoia 15.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/London
 date     2025-08-29
 pandoc   3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package     * version date (UTC) lib source
 P BiocManager   1.30.25 2024-08-28 [?] CRAN (R 4.4.1)
 P cli           3.6.3   2024-06-21 [?] CRAN (R 4.4.0)
 P digest        0.6.37  2024-08-19 [?] CRAN (R 4.4.1)
 P dplyr       * 1.1.4   2023-11-17 [?] CRAN (R 4.4.0)
 P evaluate      1.0.0   2024-09-17 [?] CRAN (R 4.4.1)
 P fansi         1.0.6   2023-12-08 [?] CRAN (R 4.4.0)
 P fastmap       1.2.0   2024-05-15 [?] CRAN (R 4.4.0)
 P generics      0.1.3   2022-07-05 [?] CRAN (R 4.4.0)
 P glue          1.8.0   2024-09-30 [?] CRAN (R 4.4.1)
 P gt          * 0.11.1  2024-10-04 [?] CRAN (R 4.4.1)
 P htmltools     0.5.8.1 2024-04-04 [?] CRAN (R 4.4.0)
 P htmlwidgets   1.6.4   2023-12-06 [?] CRAN (R 4.4.0)
 P jsonlite      1.8.9   2024-09-20 [?] CRAN (R 4.4.1)
   knitr         1.50    2025-03-16 [1] RSPM (R 4.4.0)
   lattice       0.22-6  2024-03-20 [2] CRAN (R 4.4.2)
 P lifecycle     1.0.4   2023-11-07 [?] CRAN (R 4.4.0)
 P magrittr      2.0.3   2022-03-30 [?] CRAN (R 4.4.0)
   Matrix        1.7-1   2024-10-18 [2] CRAN (R 4.4.2)
 P pillar        1.9.0   2023-03-22 [?] CRAN (R 4.4.0)
 P pkgconfig     2.0.3   2019-09-22 [?] CRAN (R 4.4.0)
 P R6            2.5.1   2021-08-19 [?] CRAN (R 4.4.0)
   renv          1.0.10  2024-10-05 [1] CRAN (R 4.4.1)
 P rlang         1.1.4   2024-06-04 [?] CRAN (R 4.4.0)
 P rmarkdown     2.28    2024-08-17 [?] CRAN (R 4.4.0)
 P rstudioapi    0.16.0  2024-03-24 [?] CRAN (R 4.4.0)
 P sass          0.4.9   2024-03-15 [?] CRAN (R 4.4.0)
 P sessioninfo   1.2.2   2021-12-06 [?] CRAN (R 4.4.0)
 P survival    * 3.7-0   2024-06-05 [?] CRAN (R 4.4.0)
 P tibble        3.2.1   2023-03-20 [?] CRAN (R 4.4.0)
 P tidyselect    1.2.1   2024-03-11 [?] CRAN (R 4.4.0)
 P utf8          1.2.4   2023-10-22 [?] CRAN (R 4.4.0)
 P vctrs         0.6.5   2023-12-01 [?] CRAN (R 4.4.0)
 P withr         3.0.1   2024-07-31 [?] CRAN (R 4.4.0)
   xfun          0.52    2025-04-02 [1] RSPM (R 4.4.0)
 P xml2          1.3.6   2023-12-04 [?] CRAN (R 4.4.0)
 P yaml          2.3.10  2024-07-26 [?] CRAN (R 4.4.0)

 [1] /Users/christinafillmore/Documents/GitHub/CAMIS/renv/library/macos/R-4.4/aarch64-apple-darwin20
 [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library

 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────