library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(gt)
Traditionally, survival analysis focuses on the time to a single first event. While there are many applications for such time-to-event analysis in clinical trials, this approach falls short when events of interest can occur multiple times within the same subject. Recurrent event models extend the traditional Cox proportional hazards framework to account for multiple events per subject (Ozga et al. 2018, Amorim & Cai 2015).
In this tutorial, we will demonstrate how to implement different recurrent event models in R, specifically the Andersen-Gill, proportional means/rates, Prentice-Williams-Peterson, and Wei-Lin-Weissfeld models, using the well-known survival
package. The R code follows the layout of Amor 2023, with additional insights taken from Lu et al. 2018.
Recurrent event models can roughly be divided in three categories: counting process models, conditional models and marginal models. In the section below, we will explore the difference between each of these approaches. In addition, important aspects of data structure will be discussed by means of two fictional subjects, one with 4 events and 0 censored observations (events at time 6, 9, 56 and 88), and another with 2 events and 1 censored observation (events at time 42, 57, and censored at time 91).
Define the following:
\[*\]
\(\lambda_i(t)\): hazard function for the \(i\)th subject at time \(t\)
\(\lambda_{ij}(t)\): hazard function for the \(j\)th event of the \(i\)th subject at time \(t\)
\(\lambda_0(t)\): common baseline hazard for all events
\(\lambda_{0j}(t)\): event-specific baseline hazard for the \(j\)th event at time \(t\)
\(\beta\): common parameter vector
\(\beta_j\): event-specific parameter vector for the \(j\)th event
\(X_{ij}\): covariate vector for the \(j\)th event of the \(i\)th subject
\[ \lambda_i(t) = \lambda_0(t) \exp \left( \beta X_{ij}(t) \right) \ * \]
Counting process approach: treats each subject as a multiple events counting process
Common baseline hazard \(\lambda_0(t)\)
Common regression coefficients \(\beta\)
Unrestricted risk set: a subject contributes to the risk set for an event as long as the subject is under observation, i.e. it can be at risk for a subsequent event even though the previous event did not yet occur
Order of events is not important
An essential assumption of the Andersen-Gill model is that of independent events within subjects. This, however, is often not realistic in clinical trial data. For example, let’s say that we are modelling myocardial infarction (MI). If a patient has already experienced one MI, their risk of subsequent events may increase due to underlying cardiovascular damage or presence of other risk factors. Thus, the more events a patient has, the more likely they are to experience future events, indicating dependence rather than independence. To accurately model this within-subject correlation, extensions like time-varying covariates, a robust sandwich covariance estimator or frailty terms may be needed. In this tutorial, we will discuss the sandwich correction.
Lin-Wei-Yang-Ying (LWYY) model or proportional means/rates model (Lin, Wei, Yang & Ying 2000)
Lin, Wei, Yang, and Ying introduced an improved version of the Andersen-Gill model in 2000 (often referred to as proportional means/rates model), featuring a robust sandwich estimator that explicitly accounts for individual subject clusters. These robust standard errors yield wider confidence intervals and provide asymptotically valid inference even when the independence assumption does not hold (Lee et al. 2025). The original and improved Andersen-Gill model often appear interchangeable in the literature, and while they produce identical estimates, their robust standard errors can differ substantially, which may impact the conclusions drawn from statistical inference.
For both versions of the Andersen-Gill model, the data must be structured as follows:
Subject | Time interval | Event | Stratum |
---|---|---|---|
1 | (0, 6] | 1 | 1 |
1 | (6, 9] | 1 | 1 |
1 | (9, 56] | 1 | 1 |
1 | (56, 88] | 1 | 1 |
2 | (0, 42] | 1 | 1 |
2 | (42, 87] | 1 | 1 |
2 | (87, 91] | 0 | 1 |
This can be visually represented:
In both versions of the Andersen-Gill model, each new time interval starts where the previous one ends.
Conditional approach: incorporates conditional strata to account for ordering/dependence of events
Stratified baseline hazard \(\lambda_{0j}(t)\)
Stratified regression coefficients \(\beta_j\): can be pooled (\(\beta\)) or kept as event-specific (\(\beta_j\)) in the output
Restricted risk set: contributions to the risk set for a subsequent event are restricted to only consider subjects that already experienced the previous event
Order of events is important
The Prentice-Williams-Peterson model can incorporate both overall and event-specific effects \(\beta_j\) for each covariate. An often made assumption is to set \(\beta_1 = \beta_2 = ... = \beta_k = \beta\) to estimate a common parameter \(\beta\).
Depending on the outcome of interest, Prentice, Williams and Peterson suggested two distinct models:
\[ \lambda_{ij}(t) = \lambda_{0j}(t) \exp \left( \beta_j X_{ij}(t) \right) \ * \]
The total time variant of the Prentice-Williams-Peterson model uses the same time intervals as the counting process approach (Andersen-Gill model), which is useful for modelling the full time course (\(t\)) of the recurrent event process, i.e. the hazard of any recurrence.
For the total time model, the data must be structured as follows:
Subject | Time interval | Event | Stratum |
---|---|---|---|
1 | (0, 6] | 1 | 1 |
1 | (6, 9] | 1 | 2 |
1 | (9, 56] | 1 | 3 |
1 | (56, 88] | 1 | 4 |
2 | (0, 42] | 1 | 1 |
2 | (42, 87] | 1 | 2 |
2 | (87, 91] | 0 | 3 |
This can be visually represented:
Again, in the total time model, each new time intervals starts where the previous one ends.
\[ \lambda_{ij}(t) = \lambda_{0j}(t - t_{j-1}) \exp \left( \beta_j X_{ij}(t) \right) \ * \]
The gap time variant of the Prentice-Williams-Peterson model uses time intervals that start at zero and end at the length of time until the next event, which is useful for modelling the time between each of the recurring events (\(t - t_{j-1}\)), i.e. the hazard of recurrence after the previous event.
For the gap time model, the data must be structured as follows:
Subject | Time interval | Event | Stratum |
---|---|---|---|
1 | (0, 6] | 1 | 1 |
1 | (0, 3] | 1 | 2 |
1 | (0, 47] | 1 | 3 |
1 | (0, 32] | 1 | 4 |
2 | (0, 42] | 1 | 1 |
2 | (0, 45] | 1 | 2 |
2 | (0, 3] | 0 | 3 |
This can be visually represented:
In the gap time model, each time interval starts at zero and has a length equal to the gap time between two neighboring events.
\[ \lambda_{ij}(t) = \lambda_{0j}(t) \exp \left( \beta_j X_{ij}(t) \right) \ * \]
Marginal approach: treats each (recurrent) event as having a separate, marginal process
Stratified baseline hazard \(\lambda_{0j}(t)\)
Stratified regression coefficients \(\beta_j\): can be pooled (\(\beta\)) or kept as event-specific (\(\beta_j\)) in the output
Semi-restricted risk set: all subjects contribute follow-up times to all potential events, i.e. each subject is at risk for all potential events, regardless of how many events that subject actually experiences
Order of events is not important
Although the Wei-Lin-Weissfeld model has it roots in competing risks analysis, it conveniently lends itself to model recurrent events as well. Like the Andersen-Gill model, the Wei-Lin-Weissfeld model also assumes independence of events, which is often not feasible in practice. In addition, it is assumed there is no specific order among the events or that the events are different types of events, and not necessarily recurrent events.
Like the Prentice-Williams-Peterson models, the Wei-Lin-Weissfeld model can incorporate both overall and event-specific effects \(\beta_j\) for each covariate. An often made assumption is to set \(\beta_1 = \beta_2 = ... = \beta_k = \beta\) to estimate a common parameter \(\beta\). Another approach is to combine event-specific effects \(\beta_j\) to get an estimator of the average effect, as described in Wei, Lin & Weissfeld 1989 (this is not discussed further here).
For Wei-Lin-Weissfeld models, the data must be structured as follows:
Subject | Time interval | Event | Stratum |
---|---|---|---|
1 | (0, 6] | 1 | 1 |
1 | (0, 9] | 1 | 2 |
1 | (0, 56] | 1 | 3 |
1 | (0, 88] | 1 | 4 |
2 | (0, 42] | 1 | 1 |
2 | (0, 87] | 1 | 2 |
2 | (0, 91] | 0 | 3 |
2 | (0, 91] | 0 | 4 |
This can be visually represented:
In the Wei-Lin-Weissfeld model, each time intervals starts at zero and ends at its respective event time.
In summary, the selection of the model to use would depend on the type of events, the importance of the order of the events and the time intervals to be analyzed. We made an effort to summarize the similarities and differences between the models in the table below.
AG | PWPtt | PWPgt | WLW | |
---|---|---|---|---|
Approach | counting process | conditional | conditional | marginal |
Baseline hazard | common | stratified | stratified | stratified |
Regression coefficients | common | stratified possible | stratified possible | stratified possible |
Risk set | unrestricted | restricted | restricted | semi-restricted |
Time interval | total time | total time | gap time | total time |
Order of events | not important | important | important | not important |
Hazard ratio (HR) | risk of any recurrence | risk of any recurrence | risk of recurrence after previous event | risk of event of any type, not necessarily recurrent event |
Note that, because the ordering of events is not important in the Andersen-Gill and Wei-Lin-Weissfeld model, these models come with the assumption of independence of events. In contrast, the Prentice-Williams-Peterson models overcome the need for this assumption by capturing the dependence structure between recurrent events in conditional strata. Consequently, events are assumed to be conditionally independent in the Prentice-Williams-Peterson models.
A nice visual representation of the stratification and time interval structure of each model is given below. The correct data structure is pivotal when modelling recurrent events and depends on the methodology you want to use, as illustrated in the figure.
survival
packageFor this tutorial we will use the bladder
data from the survival
package, which captures recurrences of bladder cancer from a clinical trial for an oncolytic called thiotepa. The bladder
data is regularly used by many statisticians to demonstrate methodology for recurrent event modelling. Somewhat confusingly, there are three versions of this data available:
bladder1
: original data from the study on all subjects (294 records)
bladder2
: data in Andersen-Gill format on subset of subjects with nonzero follow-up time (178 records)
bladder
: data in Wei-Lin-Weissfeld format on subset of subjects with nonzero follow-up time (340 records)
For this tutorial, we will use bladder2
to illustrate Andersen-Gill and Prentice-Williams-Peterson models, and bladder
to illustrate the Wei-Lin-Weissfeld model.
The variables included in both datasets are:
id: patient id
rx: treatment group (1 = placebo, 2 = thiotepa)
number: initial number of tumors (8 = 8 or more)
size: size in cm of largest initial tumor
start: start of time interval; this variable is not present in bladder
stop: (recurrent) event or censoring time
event: event indicator (1 = event, 0 = censored)
enum: order of recurrence
Importantly, both datasets collect the data in a counting process structure. This means that there is one record for each subject and time interval, where a time interval is defined as the time to its respective event (event = 1), or the time to follow-up if the event did not occur (event = 0).
Let’s look more closely at the bladder2
and bladder
data:
id | rx | number | size | start | stop | event | enum |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 3 | 0 | 1 | 0 | 1 |
2 | 1 | 2 | 1 | 0 | 4 | 0 | 1 |
3 | 1 | 1 | 1 | 0 | 7 | 0 | 1 |
4 | 1 | 5 | 1 | 0 | 10 | 0 | 1 |
5 | 1 | 4 | 1 | 0 | 6 | 1 | 1 |
5 | 1 | 4 | 1 | 6 | 10 | 0 | 2 |
In bladder2
, in the Andersen-Gill format, each subject has a variable amount of records, depending on the amount of events that subject experienced.
id | rx | number | size | stop | event | enum |
---|---|---|---|---|---|---|
1 | 1 | 1 | 3 | 1 | 0 | 1 |
1 | 1 | 1 | 3 | 1 | 0 | 2 |
1 | 1 | 1 | 3 | 1 | 0 | 3 |
1 | 1 | 1 | 3 | 1 | 0 | 4 |
2 | 1 | 2 | 1 | 4 | 0 | 1 |
2 | 1 | 2 | 1 | 4 | 0 | 2 |
2 | 1 | 2 | 1 | 4 | 0 | 3 |
2 | 1 | 2 | 1 | 4 | 0 | 4 |
3 | 1 | 1 | 1 | 7 | 0 | 1 |
3 | 1 | 1 | 1 | 7 | 0 | 2 |
3 | 1 | 1 | 1 | 7 | 0 | 3 |
3 | 1 | 1 | 1 | 7 | 0 | 4 |
4 | 1 | 5 | 1 | 10 | 0 | 1 |
4 | 1 | 5 | 1 | 10 | 0 | 2 |
4 | 1 | 5 | 1 | 10 | 0 | 3 |
4 | 1 | 5 | 1 | 10 | 0 | 4 |
5 | 1 | 4 | 1 | 6 | 1 | 1 |
5 | 1 | 4 | 1 | 10 | 0 | 2 |
5 | 1 | 4 | 1 | 10 | 0 | 3 |
5 | 1 | 4 | 1 | 10 | 0 | 4 |
In bladder
, in the Wei-Lin-Weissfeld format, each subject has four records, regardless of how many events that subject actually experienced. In addition, there is no start
variable, as all time intervals start at zero.
In the survival
package, any survival analysis based on the Cox proportional hazard model can be conducted using the coxph()
function. Hence, conveniently, when modelling time-to-event data with recurrent events, the same function can be used. The caveat here is that an adequate data structure is required, which must be in correspondence with the model you want to use.
In this section of the tutorial, we will explain how the arguments of the coxph()
function and data structure must be defined to fit every type of recurrent event model correctly.
For the improved version of the Andersen-Gill model you must specify:
formula = Surv(start, stop, event) ~ 'predictors'
cluster = id
And the data structure must be:
Id | Time interval | Start | Stop | Event |
---|---|---|---|---|
1 | (0, 1] | 0 | 1 | 0 |
2 | (0, 4] | 0 | 4 | 0 |
3 | (0, 7] | 0 | 7 | 0 |
4 | (0, 10] | 0 | 10 | 0 |
5 | (0, 6] | 0 | 6 | 1 |
5 | (6, 10] | 6 | 10 | 0 |
We will use the bladder2
data for this.
AG <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size,
ties = "breslow",
cluster = id,
data = bladder2)
summary(AG)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number +
size, data = bladder2, ties = "breslow", cluster = id)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
as.factor(rx)2 -0.45979 0.63142 0.19996 0.25801 -1.782 0.07474 .
number 0.17164 1.18726 0.04733 0.06131 2.799 0.00512 **
size -0.04256 0.95833 0.06903 0.07555 -0.563 0.57317
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2 0.6314 1.5837 0.3808 1.047
number 1.1873 0.8423 1.0528 1.339
size 0.9583 1.0435 0.8264 1.111
Concordance= 0.634 (se = 0.032 )
Likelihood ratio test= 16.77 on 3 df, p=8e-04
Wald test = 11.76 on 3 df, p=0.008
Score (logrank) test = 18.57 on 3 df, p=3e-04, Robust = 11.44 p=0.01
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
By defining the cluster
argument, coxph()
will automatically set robust = TRUE
, and compute a robust sandwich covariance. The summary function will then display both the non-robust (se(coef)
) and robust (robust se
) standard error estimates. Under the hood, the robust standard errors will consider all id clusters separately and ultimately sum up the score residuals for each distinct cluster.
To our knowledge, the original Andersen-Gill model of 1989 can only be fitted in R by adding an artificial clustering variable with unique entries to the bladder2
data, which we call id2
. This artificial clustering variable will ignore any clustering that is actually present in the data.
Except for cluster = id2
, the rest of the code remains the same.
AG_original <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size,
ties = "breslow",
cluster = id2,
data = bladder2)
summary(AG_original)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number +
size, data = bladder2, ties = "breslow", cluster = id2)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
as.factor(rx)2 -0.45979 0.63142 0.19996 0.22902 -2.008 0.04468 *
number 0.17164 1.18726 0.04733 0.06469 2.653 0.00797 **
size -0.04256 0.95833 0.06903 0.07325 -0.581 0.56119
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2 0.6314 1.5837 0.4031 0.9891
number 1.1873 0.8423 1.0459 1.3477
size 0.9583 1.0435 0.8302 1.1063
Concordance= 0.634 (se = 0.032 )
Likelihood ratio test= 16.77 on 3 df, p=8e-04
Wald test = 9.36 on 3 df, p=0.02
Score (logrank) test = 18.57 on 3 df, p=3e-04, Robust = 14.27 p=0.003
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
Although the original Andersen-Gill model does not consider separate id clusters, it still computes robust standard errors using the sandwich estimator, as robust = TRUE
. The resulting robust standard errors (robust se
) differ from those provided for the improved Andersen-Gill model, while the estimated coefficients (coef
) and non-robust standard errors (se(coef)
) remain perfectly unchanged.
For the Prentice-Williams-Peterson total time model you must specify:
formula = Surv(start, stop, event) ~ 'predictors' + strata(enum)
cluster = id
And the data structure must be:
Id | Time interval | Start | Stop | Event | Enum |
---|---|---|---|---|---|
1 | (0, 1] | 0 | 1 | 0 | 1 |
2 | (0, 4] | 0 | 4 | 0 | 1 |
3 | (0, 7] | 0 | 7 | 0 | 1 |
4 | (0, 10] | 0 | 10 | 0 | 1 |
5 | (0, 6] | 0 | 6 | 1 | 1 |
5 | (6, 10] | 6 | 10 | 0 | 2 |
We will use the bladder2
data for this.
PWPtt <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size + strata(enum),
ties = "breslow",
cluster = id,
data = bladder2)
summary(PWPtt)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number +
size + strata(enum), data = bladder2, ties = "breslow", cluster = id)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
as.factor(rx)2 -0.334295 0.715842 0.216087 0.197064 -1.696 0.0898 .
number 0.115653 1.122606 0.053681 0.049913 2.317 0.0205 *
size -0.008051 0.991982 0.072725 0.060124 -0.134 0.8935
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2 0.7158 1.3970 0.4865 1.053
number 1.1226 0.8908 1.0180 1.238
size 0.9920 1.0081 0.8817 1.116
Concordance= 0.615 (se = 0.032 )
Likelihood ratio test= 6.11 on 3 df, p=0.1
Wald test = 7.19 on 3 df, p=0.07
Score (logrank) test = 6.45 on 3 df, p=0.09, Robust = 8.76 p=0.03
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
The conditional strata of the Prentice-Williams-Peterson model are set by strata(enum)
in the formula, where enum
captures the ordering of recurrent events.
For the Prentice-Williams-Peterson gap time model you must specify:
formula = Surv(gtime, event) ~ 'predictors' + strata(enum)
cluster = id
And the data structure must be:
Id | Time interval | Start | Stop | Event | Enum |
---|---|---|---|---|---|
1 | (0, 1] | 0 | 1 | 0 | 1 |
2 | (0, 4] | 0 | 4 | 0 | 1 |
3 | (0, 7] | 0 | 7 | 0 | 1 |
4 | (0, 10] | 0 | 10 | 0 | 1 |
5 | (0, 6] | 0 | 6 | 1 | 1 |
5 | (0, 4] | 0 | 4 | 0 | 2 |
This data structure can be achieved in bladder2
by adding a gtime
variable.
id | rx | number | size | start | stop | event | enum | id2 | gtime |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 3 | 0 | 1 | 0 | 1 | 1 | 1 |
2 | 1 | 2 | 1 | 0 | 4 | 0 | 1 | 2 | 4 |
3 | 1 | 1 | 1 | 0 | 7 | 0 | 1 | 3 | 7 |
4 | 1 | 5 | 1 | 0 | 10 | 0 | 1 | 4 | 10 |
5 | 1 | 4 | 1 | 0 | 6 | 1 | 1 | 5 | 6 |
5 | 1 | 4 | 1 | 6 | 10 | 0 | 2 | 6 | 4 |
We artificially set start = 0 for each gap time interval by including gtime
instead of start, stop
in the Surv()
object.
PWPgt <- coxph(Surv(gtime, event) ~ as.factor(rx) + number + size + strata(enum),
ties = "breslow",
cluster = id,
data = bladder2)
summary(PWPgt)
Call:
coxph(formula = Surv(gtime, event) ~ as.factor(rx) + number +
size + strata(enum), data = bladder2, ties = "breslow", cluster = id)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
as.factor(rx)2 -0.26952 0.76375 0.20766 0.20808 -1.295 0.19522
number 0.15353 1.16595 0.05211 0.04889 3.140 0.00169 **
size 0.00684 1.00686 0.07001 0.06222 0.110 0.91246
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2 0.7637 1.3093 0.5080 1.148
number 1.1659 0.8577 1.0594 1.283
size 1.0069 0.9932 0.8913 1.137
Concordance= 0.596 (se = 0.032 )
Likelihood ratio test= 8.76 on 3 df, p=0.03
Wald test = 12.14 on 3 df, p=0.007
Score (logrank) test = 9.6 on 3 df, p=0.02, Robust = 10.21 p=0.02
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
For the Wei-Lin-Weissfeld model you must specify:
formula = Surv(stop, event) ~ 'predictors' + strata(enum)
cluster = id
And the data structure must be:
Id | Time interval | Start | Stop | Event | Enum |
---|---|---|---|---|---|
1 | (0, 1] | 0 | 1 | 0 | 1 |
1 | (0, 1] | 0 | 1 | 0 | 2 |
1 | (0, 1] | 0 | 1 | 0 | 3 |
1 | (0, 1] | 0 | 1 | 0 | 4 |
2 | (0, 4] | 0 | 4 | 0 | 1 |
2 | (0, 4] | 0 | 4 | 0 | 2 |
2 | (0, 4] | 0 | 4 | 0 | 3 |
2 | (0, 4] | 0 | 4 | 0 | 4 |
3 | (0, 7] | 0 | 7 | 0 | 1 |
3 | (0, 7] | 0 | 7 | 0 | 2 |
3 | (0, 7] | 0 | 7 | 0 | 3 |
3 | (0, 7] | 0 | 7 | 0 | 4 |
4 | (0, 10] | 0 | 10 | 0 | 1 |
4 | (0, 10] | 0 | 10 | 0 | 2 |
4 | (0, 10] | 0 | 10 | 0 | 3 |
4 | (0, 10] | 0 | 10 | 0 | 4 |
5 | (0, 6] | 0 | 6 | 1 | 1 |
5 | (0, 10] | 0 | 10 | 0 | 2 |
5 | (0, 10] | 0 | 10 | 0 | 3 |
5 | (0, 10] | 0 | 10 | 0 | 4 |
We will use the bladder
data for this.
WLW <- coxph(Surv(stop, event) ~ as.factor(rx) + number + size + strata(enum),
ties = "breslow",
cluster = id,
data = bladder)
summary(WLW)
Call:
coxph(formula = Surv(stop, event) ~ as.factor(rx) + number +
size + strata(enum), data = bladder, ties = "breslow", cluster = id)
n= 340, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
as.factor(rx)2 -0.57986 0.55998 0.20118 0.30344 -1.911 0.0560 .
number 0.20849 1.23182 0.04691 0.06567 3.175 0.0015 **
size -0.05094 0.95034 0.06967 0.09304 -0.548 0.5840
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2 0.5600 1.7858 0.3089 1.015
number 1.2318 0.8118 1.0830 1.401
size 0.9503 1.0523 0.7919 1.140
Concordance= 0.663 (se = 0.036 )
Likelihood ratio test= 24.71 on 3 df, p=2e-05
Wald test = 15.56 on 3 df, p=0.001
Score (logrank) test = 27.89 on 3 df, p=4e-06, Robust = 11.75 p=0.008
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
Importantly, the strata of the Wei-Lin-Weissfeld model as set by strata(enum)
are substantially different from the conditional strata of the Prentice-Williams-Peterson model. The enum
variable is now no longer assumed to be an ordinal variable.
Note: For all recurrent event models, another way of defining the subject clusters is by using cluster(id)
in the formula, rather than setting cluster = id
. This results in the same outcomes, as shown below.
AG_v1 <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size,
ties = "breslow",
cluster = id,
data = bladder2)
summary(AG_v1)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number +
size, data = bladder2, ties = "breslow", cluster = id)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
as.factor(rx)2 -0.45979 0.63142 0.19996 0.25801 -1.782 0.07474 .
number 0.17164 1.18726 0.04733 0.06131 2.799 0.00512 **
size -0.04256 0.95833 0.06903 0.07555 -0.563 0.57317
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2 0.6314 1.5837 0.3808 1.047
number 1.1873 0.8423 1.0528 1.339
size 0.9583 1.0435 0.8264 1.111
Concordance= 0.634 (se = 0.032 )
Likelihood ratio test= 16.77 on 3 df, p=8e-04
Wald test = 11.76 on 3 df, p=0.008
Score (logrank) test = 18.57 on 3 df, p=3e-04, Robust = 11.44 p=0.01
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
AG_v2 <- coxph(Surv(start, stop, event) ~ as.factor(rx) + number + size + cluster(id),
ties = "breslow",
data = bladder2)
summary(AG_v2)
Call:
coxph(formula = Surv(start, stop, event) ~ as.factor(rx) + number +
size, data = bladder2, ties = "breslow", cluster = id)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
as.factor(rx)2 -0.45979 0.63142 0.19996 0.25801 -1.782 0.07474 .
number 0.17164 1.18726 0.04733 0.06131 2.799 0.00512 **
size -0.04256 0.95833 0.06903 0.07555 -0.563 0.57317
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
as.factor(rx)2 0.6314 1.5837 0.3808 1.047
number 1.1873 0.8423 1.0528 1.339
size 0.9583 1.0435 0.8264 1.111
Concordance= 0.634 (se = 0.032 )
Likelihood ratio test= 16.77 on 3 df, p=8e-04
Wald test = 11.76 on 3 df, p=0.008
Score (logrank) test = 18.57 on 3 df, p=3e-04, Robust = 11.44 p=0.01
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
Note: R uses ties = "efron"
by default, while SAS uses ties = breslow
by default. In this tutorial, we forced R to use ties = "breslow"
to match SAS for all recurrent event models. For more information, be sure to check the CAMIS webpage on the comparison of Cox proportional hazards models in R and SAS.
In terms of interpretation, hazard ratios (\(\exp(\beta_j)\)) are often used when making inferences based on Cox proportional hazards models. Now, as you may remember from the overview presented earlier, it is important to recognize that each of the recurrent event models comes with a slightly different interpretation of the hazard ratio, as defined by the assumptions around the model.
AG | PWPtt | PWPgt | WLW | |
---|---|---|---|---|
Hazard ratio (HR) | risk of any recurrence | risk of any recurrence | risk of recurrence after previous event | risk of event of any type, not necessarily recurrent event |
This means that, for the bladder
data, we can draw slightly different conclusions on the hazard ratio of the group treated with thiotepa (rx = 2) versus the placebo group (rx = 1).
Model | HR: rx2 vs rx1 | 95% CI | P-value |
---|---|---|---|
AG | 0.631 | 0.381 to 1.047 | 0.0747 |
Original AG | 0.631 | 0.403 to 0.989 | 0.0447 |
PWPtt | 0.716 | 0.487 to 1.053 | 0.0898 |
PWPgt | 0.764 | 0.508 to 1.148 | 0.1952 |
WLW | 0.560 | 0.309 to 1.015 | 0.0560 |
These conclusions are:
Andersen-Gill: the risk of having any new tumor recurrence in the treatment group is 0.631 (0.381 - 1.047) times that of the placebo group
Prentice-Williams-Peterson: total time: the risk of having any new tumor recurrence in the treatment group is 0.716 (0.487 - 1.053) times that of the placebo group
Prentice-Williams-Peterson: gap time: the risk of having a new tumor recurrence after a previous event in the treatment group is 0.764 (0.508 - 1.148) times that of the placebo group
Wei-Lin-Weissfeld: the risk of having any type of event in the treatment group is 0.560 (0.309 - 1.015) times that of the placebo group
Note: The improved Andersen-Gill model (LWYY model or proportional means/rates model) is preferred over the original Andersen-Gill model.
For the Prentice-Williams-Peterson and Wei-Lin-Weissfeld models we can incorporate both overall (\(\beta\)) and event-specific (\(\beta_j\)) effects for each covariate. To arrive at pooled model parameters these models assume that \(\beta_1 = \beta_2 = ... = \beta_k = \beta\). Until now, we have only considered pooled model parameters, but given the underlying stratification of these two models in particular, it may be valuable to look into the event-specific estimates as well (Amorim & Cai 2015).
To output event-specific estimates for the treatment effect (rx), we simply specify rx:strata(enum)
in the formula.
Total time model
PWPtt_stratified <- coxph(Surv(start, stop, event) ~ rx:strata(enum) + number + size,
ties = "breslow", cluster = id, data = bladder2)
summary(PWPtt_stratified)
Call:
coxph(formula = Surv(start, stop, event) ~ rx:strata(enum) +
number + size, data = bladder2, ties = "breslow", cluster = id)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
number 0.111071 1.117475 0.054397 0.050063 2.219 0.0265 *
size -0.006674 0.993349 0.073173 0.061409 -0.109 0.9135
rx:strata(enum)enum=1 -0.409893 0.663721 0.304293 0.286737 -1.430 0.1529
rx:strata(enum)enum=2 -0.416339 0.659457 0.398266 0.423632 -0.983 0.3257
rx:strata(enum)enum=3 -0.142970 0.866780 0.593818 0.405303 -0.353 0.7243
rx:strata(enum)enum=4 0.105362 1.111112 0.679095 0.469706 0.224 0.8225
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
number 1.1175 0.8949 1.0130 1.233
size 0.9933 1.0067 0.8807 1.120
rx:strata(enum)enum=1 0.6637 1.5067 0.3784 1.164
rx:strata(enum)enum=2 0.6595 1.5164 0.2875 1.513
rx:strata(enum)enum=3 0.8668 1.1537 0.3917 1.918
rx:strata(enum)enum=4 1.1111 0.9000 0.4425 2.790
Concordance= 0.609 (se = 0.033 )
Likelihood ratio test= 6.73 on 6 df, p=0.3
Wald test = 8.22 on 6 df, p=0.2
Score (logrank) test = 7.02 on 6 df, p=0.3, Robust = 9.37 p=0.2
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
Gap time model
PWPgt_stratified <- coxph(Surv(gtime, event) ~ rx:strata(enum) + number + size,
ties = “breslow”, cluster = id, data = bladder2)
summary(PWPgt_stratified)
PWPgt_stratified <- coxph(Surv(gtime, event) ~ rx:strata(enum) + number + size,
ties = "breslow", cluster = id, data = bladder2)
summary(PWPgt_stratified)
Call:
coxph(formula = Surv(gtime, event) ~ rx:strata(enum) + number +
size, data = bladder2, ties = "breslow", cluster = id)
n= 178, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
number 0.15193 1.16408 0.05259 0.04864 3.124 0.00179 **
size 0.01319 1.01328 0.07021 0.06297 0.209 0.83406
rx:strata(enum)enum=1 -0.43665 0.64620 0.30521 0.28376 -1.539 0.12385
rx:strata(enum)enum=2 -0.30182 0.73947 0.39752 0.38907 -0.776 0.43789
rx:strata(enum)enum=3 0.01485 1.01497 0.47722 0.49843 0.030 0.97622
rx:strata(enum)enum=4 0.06019 1.06204 0.58289 0.53982 0.112 0.91122
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
number 1.1641 0.8590 1.0582 1.281
size 1.0133 0.9869 0.8956 1.146
rx:strata(enum)enum=1 0.6462 1.5475 0.3705 1.127
rx:strata(enum)enum=2 0.7395 1.3523 0.3449 1.585
rx:strata(enum)enum=3 1.0150 0.9853 0.3821 2.696
rx:strata(enum)enum=4 1.0620 0.9416 0.3687 3.059
Concordance= 0.603 (se = 0.032 )
Likelihood ratio test= 9.73 on 6 df, p=0.1
Wald test = 14.45 on 6 df, p=0.02
Score (logrank) test = 10.49 on 6 df, p=0.1, Robust = 11.41 p=0.08
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
WLW_stratified <- coxph(Surv(stop, event) ~ rx:strata(enum) + number + size,
ties = "breslow", cluster = id, data = bladder)
summary(WLW_stratified)
Call:
coxph(formula = Surv(stop, event) ~ rx:strata(enum) + number +
size, data = bladder, ties = "breslow", cluster = id)
n= 340, number of events= 112
coef exp(coef) se(coef) robust se z Pr(>|z|)
number 0.20747 1.23056 0.04685 0.06573 3.156 0.0016 **
size -0.05187 0.94945 0.06966 0.09300 -0.558 0.5770
rx:strata(enum)enum=1 -0.47890 0.61947 0.30583 0.28313 -1.691 0.0908 .
rx:strata(enum)enum=2 -0.64914 0.52249 0.39217 0.36826 -1.763 0.0779 .
rx:strata(enum)enum=3 -0.71783 0.48781 0.45971 0.42148 -1.703 0.0885 .
rx:strata(enum)enum=4 -0.56175 0.57021 0.56143 0.49592 -1.133 0.2573
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
number 1.2306 0.8126 1.0818 1.400
size 0.9494 1.0532 0.7912 1.139
rx:strata(enum)enum=1 0.6195 1.6143 0.3556 1.079
rx:strata(enum)enum=2 0.5225 1.9139 0.2539 1.075
rx:strata(enum)enum=3 0.4878 2.0500 0.2135 1.114
rx:strata(enum)enum=4 0.5702 1.7537 0.2157 1.507
Concordance= 0.661 (se = 0.036 )
Likelihood ratio test= 24.95 on 6 df, p=3e-04
Wald test = 16.54 on 6 df, p=0.01
Score (logrank) test = 28.19 on 6 df, p=9e-05, Robust = 11.92 p=0.06
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
Amor 2023. Eat, Sleep, R, Repeat.
Amorim & Cai 2015. Modelling recurrent events: a tutorial for analysis in epidemiology. International Journal of Epidemiology. 2015 Feb;44(1):324-33.
Andersen & Gill 1982. Cox’s Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics. 10(4):1100–1120.
Lee et al. 2025. Comparison of total event analysis and first event analysis in relation to heterogeneity in cardiovascular trials. BMC Medical Research Methodology. 2025 Jun 9;25(1):159.
Lin, Wei, Yang & Ying 2000. Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B. 62(4):711–730.
Lu & Shen 2018. Application of Survival Analysis in Multiple Events Using SAS. PharmaSUG 2018.
Ozga et al. 2018. A systematic comparison of recurrent event models for application to composite endpoints. BMC Medical Research Methodology. 2018 Jan 4;18(1):2.
Prentice, Williams & Peterson 1981. On the Regression Analysis of Multivariate Failure Time Data. Biometrika. 68(2):373–379.
Wei, Lin & Weissfeld 1989. Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions. Journal of the American Statistical Association. 84(408):1065–1073.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os macOS Sequoia 15.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/London
date 2025-08-29
pandoc 3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
P BiocManager 1.30.25 2024-08-28 [?] CRAN (R 4.4.1)
P cli 3.6.3 2024-06-21 [?] CRAN (R 4.4.0)
P digest 0.6.37 2024-08-19 [?] CRAN (R 4.4.1)
P dplyr * 1.1.4 2023-11-17 [?] CRAN (R 4.4.0)
P evaluate 1.0.0 2024-09-17 [?] CRAN (R 4.4.1)
P fansi 1.0.6 2023-12-08 [?] CRAN (R 4.4.0)
P fastmap 1.2.0 2024-05-15 [?] CRAN (R 4.4.0)
P generics 0.1.3 2022-07-05 [?] CRAN (R 4.4.0)
P glue 1.8.0 2024-09-30 [?] CRAN (R 4.4.1)
P gt * 0.11.1 2024-10-04 [?] CRAN (R 4.4.1)
P htmltools 0.5.8.1 2024-04-04 [?] CRAN (R 4.4.0)
P htmlwidgets 1.6.4 2023-12-06 [?] CRAN (R 4.4.0)
P jsonlite 1.8.9 2024-09-20 [?] CRAN (R 4.4.1)
knitr 1.50 2025-03-16 [1] RSPM (R 4.4.0)
lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.2)
P lifecycle 1.0.4 2023-11-07 [?] CRAN (R 4.4.0)
P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.4.0)
Matrix 1.7-1 2024-10-18 [2] CRAN (R 4.4.2)
P pillar 1.9.0 2023-03-22 [?] CRAN (R 4.4.0)
P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.4.0)
P R6 2.5.1 2021-08-19 [?] CRAN (R 4.4.0)
renv 1.0.10 2024-10-05 [1] CRAN (R 4.4.1)
P rlang 1.1.4 2024-06-04 [?] CRAN (R 4.4.0)
P rmarkdown 2.28 2024-08-17 [?] CRAN (R 4.4.0)
P rstudioapi 0.16.0 2024-03-24 [?] CRAN (R 4.4.0)
P sass 0.4.9 2024-03-15 [?] CRAN (R 4.4.0)
P sessioninfo 1.2.2 2021-12-06 [?] CRAN (R 4.4.0)
P survival * 3.7-0 2024-06-05 [?] CRAN (R 4.4.0)
P tibble 3.2.1 2023-03-20 [?] CRAN (R 4.4.0)
P tidyselect 1.2.1 2024-03-11 [?] CRAN (R 4.4.0)
P utf8 1.2.4 2023-10-22 [?] CRAN (R 4.4.0)
P vctrs 0.6.5 2023-12-01 [?] CRAN (R 4.4.0)
P withr 3.0.1 2024-07-31 [?] CRAN (R 4.4.0)
xfun 0.52 2025-04-02 [1] RSPM (R 4.4.0)
P xml2 1.3.6 2023-12-04 [?] CRAN (R 4.4.0)
P yaml 2.3.10 2024-07-26 [?] CRAN (R 4.4.0)
[1] /Users/christinafillmore/Documents/GitHub/CAMIS/renv/library/macos/R-4.4/aarch64-apple-darwin20
[2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────