library(gsDesign)
library(gsDesign2)
library(rpact)
library(tibble)
Group sequential design in R
Group sequential design: time-to-event endpoint
While a group sequential design (GSD) could be applied for different types of endpoints, here we focus on time-to-event endpoints.
Available R packages
The commonly used R packages for power and sample size calculations utilizing a GSD are: gsDesign (also has a web interface), gsDesign2, and rpact.
Design assumptions
Using a toy example, we will assume that a primary objective of a phase III oncology trial is to compare a new therapy to a control in terms of progression-free survival (PFS) and overall survival (OS). Note that, in this example, we have a family of primary endpoints, i.e., if at least one of the endpoints is successful, the study will be declared a success. A GSD will be utilized for each endpoint. PFS will be tested at one interim analysis (IA) for both efficacy and non-binding futility, while OS will be tested at two IAs for efficacy only. An O’Brien-Fleming spending function will be used for efficacy testing and a Hwang-Shih-Decani spending function with \(\gamma = -10\) will be used for futility.
Further design assumptions are as follows:
# PFS HR = 0.6
<- 0.6
hr1_pfs # Median PFS of 9.4 months in the control arm
<- 9.4
med_pfs # Median follow-up of 10 months for PFS
<- 10
minfu_pfs # Monthly dropout of 0.019 for PFS
<- 0.019
do_rate_pfs # IA timing for PFS is at 75% information fraction
<- c(0.75, 1)
timing_pfs # Power of 95% for PFS
<- 0.95
power_pfs
# OS HR = 0.65
<- 0.65
hr1_os # Median OS of 3 years in the control arm
<- 12 * 3
med_os # Median follow-up of 42 months for OS
<- 42
minfu_os # Monthly dropout of 0.001 for OS
<- 0.001
do_rate_os # IA timing for OS is at 60% and 80% information fraction
<- c(0.6, 0.8, 1)
timing_os # Power of 82% for OS
<- 0.82
power_os
# Enrollment period of 24 months
<- 24
enroll_dur # 1:1 randomization ratio
<- 1
rand_ratio # alpha level of 1.25% for each endpoint
<- 0.0125 alphal
We assume that given the above assumptions, we need to calculate the target number of events for each analysis as well as the total sample size.
Example code
Example using gsDesign
- PFS calculations:
<- gsDesign::gsSurv(
pfs_gsDesign k = length(timing_pfs),
timing = timing_pfs,
R = enroll_dur,
eta = do_rate_pfs,
minfup = minfu_pfs,
T = enroll_dur + minfu_pfs,
lambdaC = log(2) / med_pfs,
hr = hr1_pfs,
beta = 1 - power_pfs,
alpha = alphal,
sfu = sfLDOF,
sfl = sfHSD,
sflpar = -10,
test.type = 4
)
|>
pfs_gsDesign ::gsBoundSummary() gsDesign
Analysis Value Efficacy Futility
IA 1: 75% Z 2.6584 0.7432
N: 398 p (1-sided) 0.0039 0.2287
Events: 176 ~HR at bound 0.6693 0.8938
Month: 25 P(Cross) if HR=1 0.0039 0.7713
P(Cross) if HR=0.6 0.7668 0.0041
Final Z 2.2801 2.2801
N: 398 p (1-sided) 0.0113 0.0113
Events: 234 ~HR at bound 0.7421 0.7421
Month: 34 P(Cross) if HR=1 0.0125 0.9875
P(Cross) if HR=0.6 0.9500 0.0500
- OS calculations:
<- gsDesign::gsSurv(
os_gsDesign k = length(timing_os),
timing = timing_os,
R = enroll_dur,
eta = do_rate_os,
minfup = minfu_os,
T = enroll_dur + minfu_os,
lambdaC = log(2) / med_os,
hr = hr1_os,
beta = 1 - power_os,
alpha = alphal,
sfu = sfLDOF,
test.type = 1
)
|>
os_gsDesign ::gsBoundSummary() gsDesign
Analysis Value Efficacy
IA 1: 60% Z 3.0205
N: 394 p (1-sided) 0.0013
Events: 131 ~HR at bound 0.5896
Month: 38 P(Cross) if HR=1 0.0013
P(Cross) if HR=0.65 0.2899
IA 2: 80% Z 2.5874
N: 394 p (1-sided) 0.0048
Events: 175 ~HR at bound 0.6758
Month: 51 P(Cross) if HR=1 0.0052
P(Cross) if HR=0.65 0.6082
Final Z 2.2958
N: 394 p (1-sided) 0.0108
Events: 218 ~HR at bound 0.7327
Month: 66 P(Cross) if HR=1 0.0125
P(Cross) if HR=0.65 0.8200
Example using gsDesign2
- PFS calculations:
<- tibble(
enroll_rate stratum = "All",
duration = enroll_dur,
rate = 1
)<- tibble(
fail_rate_pfs stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_pfs,
hr = hr1_pfs,
dropout_rate = do_rate_pfs
)
<- gsDesign2::gs_design_ahr(
pfs_gsDesign2 enroll_rate = enroll_rate,
fail_rate = fail_rate_pfs,
ratio = rand_ratio,
beta = 1 - power_pfs,
alpha = alphal,
info_frac = timing_pfs,
analysis_time = enroll_dur + minfu_pfs,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),lower = gs_spending_bound,
lpar = list(
sf = gsDesign::sfHSD,
total_spend = 1 - power_pfs,
param = -10
),info_scale = "h0_info"
)
|>
pfs_gsDesign2 summary() |>
as_gt()
Bound summary for AHR design | |||||
AHR approximations of ~HR at bound | |||||
Bound | Z | Nominal p1 | ~HR at bound2 |
Cumulative boundary crossing probability
|
|
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 25.3 N: 405.8 Events: 179.2 AHR: 0.6 Information fraction: 0.75 | |||||
Futility | 0.74 | 0.2287 | 0.8940 | 0.0041 | 0.7713 |
Efficacy | 2.66 | 0.0039 | 0.6697 | 0.7668 | 0.0039 |
Analysis: 2 Time: 34 N: 405.8 Events: 238.9 AHR: 0.6 Information fraction: 1 | |||||
Futility | 2.28 | 0.0113 | 0.7424 | 0.0500 | 0.9875 |
Efficacy | 2.28 | 0.0113 | 0.7424 | 0.9500 | 0.0125 |
1 One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
2 Approximate hazard ratio to cross bound. |
- OS calculations:
<- tibble(
fail_rate_os stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_os,
hr = hr1_os,
dropout_rate = do_rate_os
)
<- gsDesign2::gs_design_ahr(
os_gsDesign2 enroll_rate = pfs_gsDesign2$enroll_rate,
fail_rate = fail_rate_os,
ratio = rand_ratio,
beta = 1 - power_os,
alpha = alphal,
info_frac = timing_os,
analysis_time = enroll_dur + minfu_os,
test_lower = FALSE,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),info_scale = "h0_info"
)
|>
os_gsDesign2 summary() |>
as_gt()
Bound summary for AHR design | |||||
AHR approximations of ~HR at bound | |||||
Bound | Z | Nominal p1 | ~HR at bound2 |
Cumulative boundary crossing probability
|
|
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 38.4 N: 402.6 Events: 133.7 AHR: 0.65 Information fraction: 0.6 | |||||
Efficacy | 3.02 | 0.0013 | 0.5901 | 0.2899 | 0.0013 |
Analysis: 2 Time: 50.6 N: 402.6 Events: 178.2 AHR: 0.65 Information fraction: 0.8 | |||||
Efficacy | 2.59 | 0.0048 | 0.6762 | 0.6082 | 0.0052 |
Analysis: 3 Time: 66 N: 402.6 Events: 222.8 AHR: 0.65 Information fraction: 1 | |||||
Efficacy | 2.30 | 0.0108 | 0.7330 | 0.8200 | 0.0125 |
1 One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
2 Approximate hazard ratio to cross bound. |
Example using rpact
- PFS calculations:
<- rpact::getDesignGroupSequential(
pfs_rpact_gsd sided = 1,
alpha = alphal,
informationRates = timing_pfs,
typeOfDesign = "asOF",
beta = 1 - power_pfs,
typeBetaSpending = "bsHSD",
gammaB = -10,
bindingFutility = FALSE
)
<- rpact::getSampleSizeSurvival(
pfs_rpact design = pfs_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_pfs,
lambda2 = log(2) / med_pfs,
hazardRatio = hr1_pfs,
dropoutRate1 = 0.2,
dropoutRate2 = 0.2,
dropoutTime = 12
)
kable(summary(pfs_rpact))
Warning in kable.ParameterSet(summary(pfs_rpact)): Manual use of kable() for
rpact result objects is no longer needed, as the formatting and display will be
handled automatically by the rpact package
Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 2 looks (group sequential design), one-sided overall significance level 1.25%, power 95%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.6, control lambda(2) = 0.074, accrual time = 24, accrual intensity = 16.5, follow-up time = 10, dropout rate(1) = 0.2, dropout rate(2) = 0.2, dropout time = 12.
Stage | 1 | 2 |
---|---|---|
Planned information rate | 75% | 100% |
Cumulative alpha spent | 0.0039 | 0.0125 |
Cumulative beta spent | 0.0041 | 0.0500 |
Stage levels (one-sided) | 0.0039 | 0.0113 |
Efficacy boundary (z-value scale) | 2.658 | 2.280 |
Futility boundary (z-value scale) | 0.743 | |
Efficacy boundary (t) | 0.670 | 0.742 |
Futility boundary (t) | 0.894 | |
Cumulative power | 0.7668 | 0.9500 |
Number of subjects | 396.9 | 396.9 |
Expected number of subjects under H1 | 396.9 | |
Cumulative number of events | 175.8 | 234.4 |
Expected number of events under H1 | 189.2 | |
Analysis time | 25.36 | 34.00 |
Expected study duration under H1 | 27.34 | |
Overall exit probability (under H0) | 0.7752 | |
Overall exit probability (under H1) | 0.7709 | |
Exit probability for efficacy (under H0) | 0.0039 | |
Exit probability for efficacy (under H1) | 0.7668 | |
Exit probability for futility (under H0) | 0.7713 | |
Exit probability for futility (under H1) | 0.0041 |
Legend:
- (t): treatment effect scale
Note: the dropoutRate1
, dropoutRate2
arguments in getSampleSizeSurvival()
refer to the % of drop-outs by the dropoutTime
, while the eta
argument in gsDesign::gsSurv()
and the dropout_rate
value in the fail_rate
argument in gsDesign2::gs_design_ahr()
refer to the annual drop-out rate parameter under the exponential distribution. In our example, if \(X\) is a drop-out time and \(X \sim \text{Exponential} (\lambda)\), we assume that by month 12 the drop-out rate was 20%, which implies: \(P(X\le12) = 1 - e^{-12\lambda} = 0.2 \Rightarrow \lambda = 0.019\). Due to the above differences, the value \(\lambda = 0.019\) was used in the gsDesign and gsDesign2 example, while 0.2 was used in the rpact example.
- OS calculations:
<- rpact::getDesignGroupSequential(
os_rpact_gsd sided = 1,
alpha = alphal,
informationRates = timing_os,
typeOfDesign = "asOF",
beta = 1 - power_os
)
<- rpact::getSampleSizeSurvival(
os_rpact design = os_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_os,
lambda2 = log(2) / med_os,
hazardRatio = hr1_os,
dropoutRate1 = 1 - exp(-do_rate_os * 12),
dropoutRate2 = 1 - exp(-do_rate_os * 12),
dropoutTime = 12
)
kable(summary(os_rpact))
Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 3 looks (group sequential design), one-sided overall significance level 1.25%, power 82%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.65, control lambda(2) = 0.019, accrual time = 24, accrual intensity = 16.5, follow-up time = 42, dropout rate(1) = 0.012, dropout rate(2) = 0.012, dropout time = 12.
Stage | 1 | 2 | 3 |
---|---|---|---|
Planned information rate | 60% | 80% | 100% |
Cumulative alpha spent | 0.0013 | 0.0052 | 0.0125 |
Stage levels (one-sided) | 0.0013 | 0.0048 | 0.0108 |
Efficacy boundary (z-value scale) | 3.020 | 2.587 | 2.296 |
Efficacy boundary (t) | 0.590 | 0.676 | 0.733 |
Cumulative power | 0.2899 | 0.6082 | 0.8200 |
Number of subjects | 395.1 | 395.1 | 395.1 |
Expected number of subjects under H1 | 395.1 | ||
Cumulative number of events | 131.2 | 174.9 | 218.6 |
Expected number of events under H1 | 179.4 | ||
Analysis time | 38.44 | 50.60 | 66.00 |
Expected study duration under H1 | 53.11 | ||
Exit probability for efficacy (under H0) | 0.0013 | 0.0040 | |
Exit probability for efficacy (under H1) | 0.2899 | 0.3182 |
Legend:
- (t): treatment effect scale