Group sequential design in R

Group sequential design: time-to-event endpoint

While a group sequential design (GSD) could be applied for different types of endpoints, here we focus on time-to-event endpoints.

Available R packages

The commonly used R packages for power and sample size calculations utilizing a GSD are: gsDesign (also has a web interface), gsDesign2, and rpact.

Design assumptions

Using a toy example, we will assume that a primary objective of a phase III oncology trial is to compare a new therapy to a control in terms of progression-free survival (PFS) and overall survival (OS). Note that, in this example, we have a family of primary endpoints, i.e., if at least one of the endpoints is successful, the study will be declared a success. A GSD will be utilized for each endpoint. PFS will be tested at one interim analysis (IA) for both efficacy and non-binding futility, while OS will be tested at two IAs for efficacy only. An O’Brien-Fleming spending function will be used for efficacy testing and a Hwang-Shih-Decani spending function with \(\gamma = -10\) will be used for futility.

Further design assumptions are as follows:

# PFS HR = 0.6
hr1_pfs <- 0.6
# Median PFS of 9.4 months in the control arm
med_pfs <- 9.4
# Median follow-up of 10 months for PFS
minfu_pfs <- 10
# Monthly dropout of 0.019 for PFS
do_rate_pfs <- 0.019
# IA timing for PFS is at 75% information fraction
timing_pfs <- c(0.75, 1)
# Power of 95% for PFS
power_pfs <- 0.95

# OS HR = 0.65
hr1_os <- 0.65
# Median OS of 3 years in the control arm
med_os <- 12 * 3
# Median follow-up of 42 months for OS
minfu_os <- 42
# Monthly dropout of 0.001 for OS
do_rate_os <- 0.001
# IA timing for OS is at 60% and 80% information fraction
timing_os <- c(0.6, 0.8, 1)
# Power of 82% for OS
power_os <- 0.82

# Enrollment period of 24 months
enroll_dur <- 24
# 1:1 randomization ratio
rand_ratio <- 1
# alpha level of 1.25% for each endpoint
alphal <- 0.0125

We assume that given the above assumptions, we need to calculate the target number of events for each analysis as well as the total sample size.

Example code

Example using gsDesign

PFS calculations:

library(gsDesign)

pfs_gsDesign <- gsSurv(
  k = length(timing_pfs),
  timing = timing_pfs,
  R = enroll_dur,
  eta = do_rate_pfs,
  minfup = minfu_pfs,
  T = enroll_dur + minfu_pfs,
  lambdaC = log(2) / med_pfs,
  hr = hr1_pfs,
  beta = 1 - power_pfs,
  alpha = alphal,
  sfu = sfLDOF,
  sfl = sfHSD,
  sflpar = -10,
  test.type = 4
)

pfs_gsDesign |> gsBoundSummary()

    Analysis              Value Efficacy Futility
   IA 1: 75%                  Z   2.6584   0.7432
      N: 398        p (1-sided)   0.0039   0.2287
 Events: 176       ~HR at bound   0.6693   0.8938
   Month: 25   P(Cross) if HR=1   0.0039   0.7713
             P(Cross) if HR=0.6   0.7668   0.0041
       Final                  Z   2.2801   2.2801
      N: 398        p (1-sided)   0.0113   0.0113
 Events: 234       ~HR at bound   0.7421   0.7421
   Month: 34   P(Cross) if HR=1   0.0125   0.9875
             P(Cross) if HR=0.6   0.9500   0.0500

OS calculations:

os_gsDesign <- gsSurv(
  k = length(timing_os),
  timing = timing_os,
  R = enroll_dur,
  eta = do_rate_os,
  minfup = minfu_os,
  T = enroll_dur + minfu_os,
  lambdaC = log(2) / med_os,
  hr = hr1_os,
  beta = 1 - power_os,
  alpha = alphal,
  sfu = sfLDOF,
  test.type = 1
)

os_gsDesign |> gsBoundSummary()

    Analysis               Value Efficacy
   IA 1: 60%                   Z   3.0205
      N: 394         p (1-sided)   0.0013
 Events: 131        ~HR at bound   0.5896
   Month: 38    P(Cross) if HR=1   0.0013
             P(Cross) if HR=0.65   0.2899
   IA 2: 80%                   Z   2.5874
      N: 394         p (1-sided)   0.0048
 Events: 175        ~HR at bound   0.6758
   Month: 51    P(Cross) if HR=1   0.0052
             P(Cross) if HR=0.65   0.6082
       Final                   Z   2.2958
      N: 394         p (1-sided)   0.0108
 Events: 218        ~HR at bound   0.7327
   Month: 66    P(Cross) if HR=1   0.0125
             P(Cross) if HR=0.65   0.8200

Example using gsDesign2

PFS calculations:

library(gsDesign2)
library(tibble)

enroll_rate <- tibble(
  stratum = "All",
  duration = enroll_dur,
  rate = 1
)
fail_rate_pfs <- tibble(
  stratum = "All",
  duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
  fail_rate = log(2) / med_pfs,
  hr = hr1_pfs,
  dropout_rate = do_rate_pfs
)

pfs_gsDesign2 <- gs_design_ahr(
  enroll_rate = enroll_rate,
  fail_rate = fail_rate_pfs,
  ratio = rand_ratio,
  beta = 1 - power_pfs,
  alpha = alphal,
  info_frac = timing_pfs,
  analysis_time = enroll_dur + minfu_pfs,
  upper = gs_spending_bound,
  upar = list(
    sf = gsDesign::sfLDOF,
    total_spend = alphal
  ),
  lower = gs_spending_bound,
  lpar = list(
    sf = gsDesign::sfHSD,
    total_spend = 1 - power_pfs,
    param = -10
  ),
  info_scale = "h0_info"
)

pfs_gsDesign2 |>
  summary() |>
  as_gt()

Bound summary for AHR design
AHR approximations of ~HR at bound
Bound	Z	Nominal p¹	~HR at bound²	Cumulative boundary crossing probability
Bound	Z	Nominal p¹	~HR at bound²	Alternate hypothesis	Null hypothesis
Analysis: 1 Time: 25.3 N: 405.8 Events: 179.2 AHR: 0.6 Information fraction: 0.75
Futility	0.74	0.2287	0.8940	0.0041	0.7713
Efficacy	2.66	0.0039	0.6697	0.7668	0.0039
Analysis: 2 Time: 34 N: 405.8 Events: 238.9 AHR: 0.6 Information fraction: 1
Futility	2.28	0.0113	0.7424	0.0500	0.9875
Efficacy	2.28	0.0113	0.7424	0.9500	0.0125
¹ One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control.
² Approximate hazard ratio to cross bound.

OS calculations:

fail_rate_os <- tibble(
  stratum = "All",
  duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
  fail_rate = log(2) / med_os,
  hr = hr1_os,
  dropout_rate = do_rate_os
)

os_gsDesign2 <- gs_design_ahr(
  enroll_rate = pfs_gsDesign2$enroll_rate,
  fail_rate = fail_rate_os,
  ratio = rand_ratio,
  beta = 1 - power_os,
  alpha = alphal,
  info_frac = timing_os,
  analysis_time = enroll_dur + minfu_os,
  test_lower = FALSE,
  upper = gs_spending_bound,
  upar = list(
    sf = gsDesign::sfLDOF,
    total_spend = alphal
  ),
  info_scale = "h0_info"
)

os_gsDesign2 |>
  summary() |>
  as_gt()

Bound summary for AHR design
AHR approximations of ~HR at bound
Bound	Z	Nominal p¹	~HR at bound²	Cumulative boundary crossing probability
Bound	Z	Nominal p¹	~HR at bound²	Alternate hypothesis	Null hypothesis
Analysis: 1 Time: 38.4 N: 402.6 Events: 133.7 AHR: 0.65 Information fraction: 0.6
Efficacy	3.02	0.0013	0.5901	0.2899	0.0013
Analysis: 2 Time: 50.6 N: 402.6 Events: 178.2 AHR: 0.65 Information fraction: 0.8
Efficacy	2.59	0.0048	0.6762	0.6082	0.0052
Analysis: 3 Time: 66 N: 402.6 Events: 222.8 AHR: 0.65 Information fraction: 1
Efficacy	2.30	0.0108	0.7330	0.8200	0.0125
¹ One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control.
² Approximate hazard ratio to cross bound.

Example using rpact

PFS calculations:

library(rpact)

pfs_rpact_gsd <- getDesignGroupSequential(
  sided = 1,
  alpha = alphal,
  informationRates = timing_pfs,
  typeOfDesign = "asOF",
  beta = 1 - power_pfs,
  typeBetaSpending = "bsHSD",
  gammaB = -10,
  bindingFutility = FALSE
)

pfs_rpact <- getSampleSizeSurvival(
  design = pfs_rpact_gsd,
  accrualTime = enroll_dur,
  followUpTime = minfu_pfs,
  lambda2 = log(2) / med_pfs,
  hazardRatio = hr1_pfs,
  dropoutRate1 = 0.2,
  dropoutRate2 = 0.2,
  dropoutTime = 12
)

kable(summary(pfs_rpact))

Warning in kable.ParameterSet(summary(pfs_rpact)): Manual use of kable() for
rpact result objects is no longer needed, as the formatting and display will be
handled automatically by the rpact package

Sample size calculation for a survival endpoint

Sequential analysis with a maximum of 2 looks (group sequential design), one-sided overall significance level 1.25%, power 95%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.6, control lambda(2) = 0.074, accrual time = 24, accrual intensity = 16.5, follow-up time = 10, dropout rate(1) = 0.2, dropout rate(2) = 0.2, dropout time = 12.

Stage	1	2
Planned information rate	75%	100%
Cumulative alpha spent	0.0039	0.0125
Cumulative beta spent	0.0041	0.0500
Stage levels (one-sided)	0.0039	0.0113
Efficacy boundary (z-value scale)	2.658	2.280
Futility boundary (z-value scale)	0.743
Efficacy boundary (t)	0.670	0.742
Futility boundary (t)	0.894
Cumulative power	0.7668	0.9500
Number of subjects	396.9	396.9
Expected number of subjects under H1		396.9
Cumulative number of events	175.8	234.4
Expected number of events under H1	189.2
Analysis time	25.36	34.00
Expected study duration under H1		27.34
Overall exit probability (under H0)	0.7752
Overall exit probability (under H1)	0.7709
Exit probability for efficacy (under H0)	0.0039
Exit probability for efficacy (under H1)	0.7668
Exit probability for futility (under H0)	0.7713
Exit probability for futility (under H1)	0.0041

Legend:

(t): treatment effect scale

Note: the dropoutRate1, dropoutRate2 arguments in getSampleSizeSurvival() refer to the % of drop-outs by the dropoutTime, while the eta argument in gsDesign::gsSurv() and the dropout_rate value in the fail_rate argument in gsDesign2::gs_design_ahr() refer to the annual drop-out rate parameter under the exponential distribution. In our example, if \(X\) is a drop-out time and \(X \sim \text{Exponential} (\lambda)\), we assume that by month 12 the drop-out rate was 20%, which implies: \(P(X\le12) = 1 - e^{-12\lambda} = 0.2 \Rightarrow \lambda = 0.019\). Due to the above differences, the value \(\lambda = 0.019\) was used in the gsDesign and gsDesign2 example, while 0.2 was used in the rpact example.

OS calculations:

os_rpact_gsd <- getDesignGroupSequential(
  sided = 1,
  alpha = alphal,
  informationRates = timing_os,
  typeOfDesign = "asOF",
  beta = 1 - power_os
)

os_rpact <- getSampleSizeSurvival(
  design = os_rpact_gsd,
  accrualTime = enroll_dur,
  followUpTime = minfu_os,
  lambda2 = log(2) / med_os,
  hazardRatio = hr1_os,
  dropoutRate1 = 1 - exp(-do_rate_os * 12),
  dropoutRate2 = 1 - exp(-do_rate_os * 12),
  dropoutTime = 12
)

kable(summary(os_rpact))

Sample size calculation for a survival endpoint

Sequential analysis with a maximum of 3 looks (group sequential design), one-sided overall significance level 1.25%, power 82%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.65, control lambda(2) = 0.019, accrual time = 24, accrual intensity = 16.5, follow-up time = 42, dropout rate(1) = 0.012, dropout rate(2) = 0.012, dropout time = 12.

Stage	1	2	3
Planned information rate	60%	80%	100%
Cumulative alpha spent	0.0013	0.0052	0.0125
Stage levels (one-sided)	0.0013	0.0048	0.0108
Efficacy boundary (z-value scale)	3.020	2.587	2.296
Efficacy boundary (t)	0.590	0.676	0.733
Cumulative power	0.2899	0.6082	0.8200
Number of subjects	395.1	395.1	395.1
Expected number of subjects under H1			395.1
Cumulative number of events	131.2	174.9	218.6
Expected number of events under H1	179.4
Analysis time	38.44	50.60	66.00
Expected study duration under H1			53.11
Exit probability for efficacy (under H0)	0.0013	0.0040
Exit probability for efficacy (under H1)	0.2899	0.3182

Legend:

(t): treatment effect scale