library(gsDesign)
library(gsDesign2)
library(rpact)
library(tibble)Group sequential design in R
Group sequential design: time-to-event endpoint
While a group sequential design (GSD) could be applied for different types of endpoints, here we focus on time-to-event endpoints.
Available R packages
The commonly used R packages for power and sample size calculations utilizing a GSD are: gsDesign (also has a web interface), gsDesign2, and rpact.
Design assumptions
Using a toy example, we will assume that a primary objective of a phase III oncology trial is to compare a new therapy to a control in terms of progression-free survival (PFS) and overall survival (OS). Note that, in this example, we have a family of primary endpoints, i.e., if at least one of the endpoints is successful, the study will be declared a success. A GSD will be utilized for each endpoint. PFS will be tested at one interim analysis (IA) for both efficacy and non-binding futility, while OS will be tested at two IAs for efficacy only. An O’Brien-Fleming spending function will be used for efficacy testing and a Hwang-Shih-Decani spending function with \(\gamma = -10\) will be used for futility.
Further design assumptions are as follows:
# PFS HR = 0.6
hr1_pfs <- 0.6
# Median PFS of 9.4 months in the control arm
med_pfs <- 9.4
# Median follow-up of 10 months for PFS
minfu_pfs <- 10
# Monthly dropout of 0.019 for PFS
do_rate_pfs <- 0.019
# IA timing for PFS is at 75% information fraction
timing_pfs <- c(0.75, 1)
# Power of 95% for PFS
power_pfs <- 0.95
# OS HR = 0.65
hr1_os <- 0.65
# Median OS of 3 years in the control arm
med_os <- 12 * 3
# Median follow-up of 42 months for OS
minfu_os <- 42
# Monthly dropout of 0.001 for OS
do_rate_os <- 0.001
# IA timing for OS is at 60% and 80% information fraction
timing_os <- c(0.6, 0.8, 1)
# Power of 82% for OS
power_os <- 0.82
# Enrollment period of 24 months
enroll_dur <- 24
# 1:1 randomization ratio
rand_ratio <- 1
# alpha level of 1.25% for each endpoint
alphal <- 0.0125We assume that given the above assumptions, we need to calculate the target number of events for each analysis as well as the total sample size.
Example code
Example using gsDesign
- PFS calculations:
pfs_gsDesign <- gsDesign::gsSurv(
k = length(timing_pfs),
timing = timing_pfs,
R = enroll_dur,
eta = do_rate_pfs,
minfup = minfu_pfs,
T = enroll_dur + minfu_pfs,
lambdaC = log(2) / med_pfs,
hr = hr1_pfs,
beta = 1 - power_pfs,
alpha = alphal,
sfu = sfLDOF,
sfl = sfHSD,
sflpar = -10,
test.type = 4
)
pfs_gsDesign |>
gsDesign::gsBoundSummary() Analysis Value Efficacy Futility
IA 1: 75% Z 2.6584 0.7432
N: 398 p (1-sided) 0.0039 0.2287
Events: 176 ~HR at bound 0.6693 0.8938
Month: 25 P(Cross) if HR=1 0.0039 0.7713
P(Cross) if HR=0.6 0.7668 0.0041
Final Z 2.2801 2.2801
N: 398 p (1-sided) 0.0113 0.0113
Events: 234 ~HR at bound 0.7421 0.7421
Month: 34 P(Cross) if HR=1 0.0125 0.9875
P(Cross) if HR=0.6 0.9500 0.0500
- OS calculations:
os_gsDesign <- gsDesign::gsSurv(
k = length(timing_os),
timing = timing_os,
R = enroll_dur,
eta = do_rate_os,
minfup = minfu_os,
T = enroll_dur + minfu_os,
lambdaC = log(2) / med_os,
hr = hr1_os,
beta = 1 - power_os,
alpha = alphal,
sfu = sfLDOF,
test.type = 1
)
os_gsDesign |>
gsDesign::gsBoundSummary() Analysis Value Efficacy
IA 1: 60% Z 3.0205
N: 394 p (1-sided) 0.0013
Events: 131 ~HR at bound 0.5896
Month: 38 P(Cross) if HR=1 0.0013
P(Cross) if HR=0.65 0.2899
IA 2: 80% Z 2.5874
N: 394 p (1-sided) 0.0048
Events: 175 ~HR at bound 0.6758
Month: 51 P(Cross) if HR=1 0.0052
P(Cross) if HR=0.65 0.6082
Final Z 2.2958
N: 394 p (1-sided) 0.0108
Events: 218 ~HR at bound 0.7327
Month: 66 P(Cross) if HR=1 0.0125
P(Cross) if HR=0.65 0.8200
Example using gsDesign2
- PFS calculations:
enroll_rate <- tibble(
stratum = "All",
duration = enroll_dur,
rate = 1
)
fail_rate_pfs <- tibble(
stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_pfs,
hr = hr1_pfs,
dropout_rate = do_rate_pfs
)
pfs_gsDesign2 <- gsDesign2::gs_design_ahr(
enroll_rate = enroll_rate,
fail_rate = fail_rate_pfs,
ratio = rand_ratio,
beta = 1 - power_pfs,
alpha = alphal,
info_frac = timing_pfs,
analysis_time = enroll_dur + minfu_pfs,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),
lower = gs_spending_bound,
lpar = list(
sf = gsDesign::sfHSD,
total_spend = 1 - power_pfs,
param = -10
),
info_scale = "h0_info"
)
pfs_gsDesign2 |>
summary() |>
as_gt()| Bound summary for AHR design | |||||
| AHR approximations of ~HR at bound | |||||
| Bound | Z | Nominal p1 | ~HR at bound2 |
Cumulative boundary crossing probability
|
|
|---|---|---|---|---|---|
| Alternate hypothesis | Null hypothesis | ||||
| Analysis: 1 Time: 25.3 N: 405.8 Events: 179.2 AHR: 0.6 Information fraction: 0.75 | |||||
| Futility | 0.74 | 0.2287 | 0.8940 | 0.0041 | 0.7713 |
| Efficacy | 2.66 | 0.0039 | 0.6697 | 0.7668 | 0.0039 |
| Analysis: 2 Time: 34 N: 405.8 Events: 238.9 AHR: 0.6 Information fraction: 1 | |||||
| Futility | 2.28 | 0.0113 | 0.7424 | 0.0500 | 0.9875 |
| Efficacy | 2.28 | 0.0113 | 0.7424 | 0.9500 | 0.0125 |
| 1 One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
| 2 Approximate hazard ratio to cross bound. | |||||
- OS calculations:
fail_rate_os <- tibble(
stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_os,
hr = hr1_os,
dropout_rate = do_rate_os
)
os_gsDesign2 <- gsDesign2::gs_design_ahr(
enroll_rate = pfs_gsDesign2$enroll_rate,
fail_rate = fail_rate_os,
ratio = rand_ratio,
beta = 1 - power_os,
alpha = alphal,
info_frac = timing_os,
analysis_time = enroll_dur + minfu_os,
test_lower = FALSE,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),
info_scale = "h0_info"
)
os_gsDesign2 |>
summary() |>
as_gt()| Bound summary for AHR design | |||||
| AHR approximations of ~HR at bound | |||||
| Bound | Z | Nominal p1 | ~HR at bound2 |
Cumulative boundary crossing probability
|
|
|---|---|---|---|---|---|
| Alternate hypothesis | Null hypothesis | ||||
| Analysis: 1 Time: 38.4 N: 402.6 Events: 133.7 AHR: 0.65 Information fraction: 0.6 | |||||
| Efficacy | 3.02 | 0.0013 | 0.5901 | 0.2899 | 0.0013 |
| Analysis: 2 Time: 50.6 N: 402.6 Events: 178.2 AHR: 0.65 Information fraction: 0.8 | |||||
| Efficacy | 2.59 | 0.0048 | 0.6762 | 0.6082 | 0.0052 |
| Analysis: 3 Time: 66 N: 402.6 Events: 222.8 AHR: 0.65 Information fraction: 1 | |||||
| Efficacy | 2.30 | 0.0108 | 0.7330 | 0.8200 | 0.0125 |
| 1 One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
| 2 Approximate hazard ratio to cross bound. | |||||
Example using rpact
- PFS calculations:
pfs_rpact_gsd <- rpact::getDesignGroupSequential(
sided = 1,
alpha = alphal,
informationRates = timing_pfs,
typeOfDesign = "asOF",
beta = 1 - power_pfs,
typeBetaSpending = "bsHSD",
gammaB = -10,
bindingFutility = FALSE
)
pfs_rpact <- rpact::getSampleSizeSurvival(
design = pfs_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_pfs,
lambda2 = log(2) / med_pfs,
hazardRatio = hr1_pfs,
dropoutRate1 = 0.2,
dropoutRate2 = 0.2,
dropoutTime = 12
)
kable(summary(pfs_rpact))Warning in kable.ParameterSet(summary(pfs_rpact)): Manual use of kable() for
rpact result objects is no longer needed, as the formatting and display will be
handled automatically by the rpact package
Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 2 looks (group sequential design), one-sided overall significance level 1.25%, power 95%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.6, control lambda(2) = 0.074, accrual time = 24, accrual intensity = 16.5, follow-up time = 10, dropout rate(1) = 0.2, dropout rate(2) = 0.2, dropout time = 12.
| Stage | 1 | 2 |
|---|---|---|
| Planned information rate | 75% | 100% |
| Cumulative alpha spent | 0.0039 | 0.0125 |
| Cumulative beta spent | 0.0041 | 0.0500 |
| Stage levels (one-sided) | 0.0039 | 0.0113 |
| Efficacy boundary (z-value scale) | 2.658 | 2.280 |
| Futility boundary (z-value scale) | 0.743 | |
| Efficacy boundary (t) | 0.670 | 0.742 |
| Futility boundary (t) | 0.894 | |
| Cumulative power | 0.7668 | 0.9500 |
| Number of subjects | 396.9 | 396.9 |
| Expected number of subjects under H1 | 396.9 | |
| Cumulative number of events | 175.8 | 234.4 |
| Expected number of events under H1 | 189.2 | |
| Analysis time | 25.36 | 34.00 |
| Expected study duration under H1 | 27.34 | |
| Overall exit probability (under H0) | 0.7752 | |
| Overall exit probability (under H1) | 0.7709 | |
| Exit probability for efficacy (under H0) | 0.0039 | |
| Exit probability for efficacy (under H1) | 0.7668 | |
| Exit probability for futility (under H0) | 0.7713 | |
| Exit probability for futility (under H1) | 0.0041 |
Legend:
- (t): treatment effect scale
Note: the dropoutRate1, dropoutRate2 arguments in getSampleSizeSurvival() refer to the % of drop-outs by the dropoutTime, while the eta argument in gsDesign::gsSurv() and the dropout_rate value in the fail_rate argument in gsDesign2::gs_design_ahr() refer to the annual drop-out rate parameter under the exponential distribution. In our example, if \(X\) is a drop-out time and \(X \sim \text{Exponential} (\lambda)\), we assume that by month 12 the drop-out rate was 20%, which implies: \(P(X\le12) = 1 - e^{-12\lambda} = 0.2 \Rightarrow \lambda = 0.019\). Due to the above differences, the value \(\lambda = 0.019\) was used in the gsDesign and gsDesign2 example, while 0.2 was used in the rpact example.
- OS calculations:
os_rpact_gsd <- rpact::getDesignGroupSequential(
sided = 1,
alpha = alphal,
informationRates = timing_os,
typeOfDesign = "asOF",
beta = 1 - power_os
)
os_rpact <- rpact::getSampleSizeSurvival(
design = os_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_os,
lambda2 = log(2) / med_os,
hazardRatio = hr1_os,
dropoutRate1 = 1 - exp(-do_rate_os * 12),
dropoutRate2 = 1 - exp(-do_rate_os * 12),
dropoutTime = 12
)
kable(summary(os_rpact))Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 3 looks (group sequential design), one-sided overall significance level 1.25%, power 82%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.65, control lambda(2) = 0.019, accrual time = 24, accrual intensity = 16.5, follow-up time = 42, dropout rate(1) = 0.012, dropout rate(2) = 0.012, dropout time = 12.
| Stage | 1 | 2 | 3 |
|---|---|---|---|
| Planned information rate | 60% | 80% | 100% |
| Cumulative alpha spent | 0.0013 | 0.0052 | 0.0125 |
| Stage levels (one-sided) | 0.0013 | 0.0048 | 0.0108 |
| Efficacy boundary (z-value scale) | 3.020 | 2.587 | 2.296 |
| Efficacy boundary (t) | 0.590 | 0.676 | 0.733 |
| Cumulative power | 0.2899 | 0.6082 | 0.8200 |
| Number of subjects | 395.1 | 395.1 | 395.1 |
| Expected number of subjects under H1 | 395.1 | ||
| Cumulative number of events | 131.2 | 174.9 | 218.6 |
| Expected number of events under H1 | 179.4 | ||
| Analysis time | 38.44 | 50.60 | 66.00 |
| Expected study duration under H1 | 53.11 | ||
| Exit probability for efficacy (under H0) | 0.0013 | 0.0040 | |
| Exit probability for efficacy (under H1) | 0.2899 | 0.3182 |
Legend:
- (t): treatment effect scale