# PFS HR = 0.6
<- 0.6
hr1_pfs # Median PFS of 9.4 months in the control arm
<- 9.4
med_pfs # Median follow-up of 10 months for PFS
<- 10
minfu_pfs # Monthly dropout of 0.019 for PFS
<- 0.019
do_rate_pfs # IA timing for PFS is at 75% information fraction
<- c(0.75, 1)
timing_pfs # Power of 95% for PFS
<- 0.95
power_pfs
# OS HR = 0.65
<- 0.65
hr1_os # Median OS of 3 years in the control arm
<- 12 * 3
med_os # Median follow-up of 42 months for OS
<- 42
minfu_os # Monthly dropout of 0.001 for OS
<- 0.001
do_rate_os # IA timing for OS is at 60% and 80% information fraction
<- c(0.6, 0.8, 1)
timing_os # Power of 82% for OS
<- 0.82
power_os
# Enrollment period of 24 months
<- 24
enroll_dur # 1:1 randomization ratio
<- 1
rand_ratio # alpha level of 1.25% for each endpoint
<- 0.0125 alphal
Group sequential design in R
Group sequential design: time-to-event endpoint
While a group sequential design (GSD) could be applied for different types of endpoints, here we focus on time-to-event endpoints.
Available R packages
The commonly used R packages for power and sample size calculations utilizing a GSD are: gsDesign (also has a web interface), gsDesign2, and rpact.
Design assumptions
Using a toy example, we will assume that a primary objective of a phase III oncology trial is to compare a new therapy to a control in terms of progression-free survival (PFS) and overall survival (OS). Note that, in this example, we have a family of primary endpoints, i.e., if at least one of the endpoints is successful, the study will be declared a success. A GSD will be utilized for each endpoint. PFS will be tested at one interim analysis (IA) for both efficacy and non-binding futility, while OS will be tested at two IAs for efficacy only. An O’Brien-Fleming spending function will be used for efficacy testing and a Hwang-Shih-Decani spending function with \(\gamma = -10\) will be used for futility.
Further design assumptions are as follows:
We assume that given the above assumptions, we need to calculate the target number of events for each analysis as well as the total sample size.
Example code
Example using gsDesign
- PFS calculations:
library(gsDesign)
<- gsSurv(
pfs_gsDesign k = length(timing_pfs),
timing = timing_pfs,
R = enroll_dur,
eta = do_rate_pfs,
minfup = minfu_pfs,
T = enroll_dur + minfu_pfs,
lambdaC = log(2) / med_pfs,
hr = hr1_pfs,
beta = 1 - power_pfs,
alpha = alphal,
sfu = sfLDOF,
sfl = sfHSD,
sflpar = -10,
test.type = 4
)
|> gsBoundSummary() pfs_gsDesign
Analysis Value Efficacy Futility
IA 1: 75% Z 2.6584 0.7432
N: 398 p (1-sided) 0.0039 0.2287
Events: 176 ~HR at bound 0.6693 0.8938
Month: 25 P(Cross) if HR=1 0.0039 0.7713
P(Cross) if HR=0.6 0.7668 0.0041
Final Z 2.2801 2.2801
N: 398 p (1-sided) 0.0113 0.0113
Events: 234 ~HR at bound 0.7421 0.7421
Month: 34 P(Cross) if HR=1 0.0125 0.9875
P(Cross) if HR=0.6 0.9500 0.0500
- OS calculations:
<- gsSurv(
os_gsDesign k = length(timing_os),
timing = timing_os,
R = enroll_dur,
eta = do_rate_os,
minfup = minfu_os,
T = enroll_dur + minfu_os,
lambdaC = log(2) / med_os,
hr = hr1_os,
beta = 1 - power_os,
alpha = alphal,
sfu = sfLDOF,
test.type = 1
)
|> gsBoundSummary() os_gsDesign
Analysis Value Efficacy
IA 1: 60% Z 3.0205
N: 394 p (1-sided) 0.0013
Events: 131 ~HR at bound 0.5896
Month: 38 P(Cross) if HR=1 0.0013
P(Cross) if HR=0.65 0.2899
IA 2: 80% Z 2.5874
N: 394 p (1-sided) 0.0048
Events: 175 ~HR at bound 0.6758
Month: 51 P(Cross) if HR=1 0.0052
P(Cross) if HR=0.65 0.6082
Final Z 2.2958
N: 394 p (1-sided) 0.0108
Events: 218 ~HR at bound 0.7327
Month: 66 P(Cross) if HR=1 0.0125
P(Cross) if HR=0.65 0.8200
Example using gsDesign2
- PFS calculations:
library(gsDesign2)
library(tibble)
<- tibble(
enroll_rate stratum = "All",
duration = enroll_dur,
rate = 1
)<- tibble(
fail_rate_pfs stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_pfs,
hr = hr1_pfs,
dropout_rate = do_rate_pfs
)
<- gs_design_ahr(
pfs_gsDesign2 enroll_rate = enroll_rate,
fail_rate = fail_rate_pfs,
ratio = rand_ratio,
beta = 1 - power_pfs,
alpha = alphal,
info_frac = timing_pfs,
analysis_time = enroll_dur + minfu_pfs,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),lower = gs_spending_bound,
lpar = list(
sf = gsDesign::sfHSD,
total_spend = 1 - power_pfs,
param = -10
),info_scale = "h0_info"
)
|>
pfs_gsDesign2 summary() |>
as_gt()
Bound summary for AHR design | |||||
---|---|---|---|---|---|
AHR approximations of ~HR at bound | |||||
Bound | Z | Nominal p1 | ~HR at bound2 |
Cumulative boundary crossing probability
|
|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 25.3 N: 405.8 Event: 179.2 AHR: 0.6 Information fraction: 0.75 | |||||
Futility | 0.74 | 0.2287 | 0.8940 | 0.0041 | 0.7713 |
Efficacy | 2.66 | 0.0039 | 0.6697 | 0.7668 | 0.0039 |
Analysis: 2 Time: 34 N: 405.8 Event: 238.9 AHR: 0.6 Information fraction: 1 | |||||
Futility | 2.28 | 0.0113 | 0.7424 | 0.0500 | 0.9875 |
Efficacy | 2.28 | 0.0113 | 0.7424 | 0.9500 | 3 0.0125 |
1 One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
2 Approximate hazard ratio to cross bound. | |||||
3 Cumulative alpha for final analysis (0.0125) is less than the full alpha (0.025) when the futility bound is non-binding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.025 - 0.0125 = 0.0125) under the null hypothesis. |
- OS calculations:
<- tibble(
fail_rate_os stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_os,
hr = hr1_os,
dropout_rate = do_rate_os
)
<- gs_design_ahr(
os_gsDesign2 enroll_rate = pfs_gsDesign2$enroll_rate,
fail_rate = fail_rate_os,
ratio = rand_ratio,
beta = 1 - power_os,
alpha = alphal,
info_frac = timing_os,
analysis_time = enroll_dur + minfu_os,
test_lower = FALSE,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),info_scale = "h0_info"
)
|>
os_gsDesign2 summary() |>
as_gt()
Bound summary for AHR design | |||||
---|---|---|---|---|---|
AHR approximations of ~HR at bound | |||||
Bound | Z | Nominal p1 | ~HR at bound2 |
Cumulative boundary crossing probability
|
|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 38.4 N: 402.6 Event: 133.7 AHR: 0.65 Information fraction: 0.6 | |||||
Efficacy | 3.02 | 0.0013 | 0.5901 | 0.2899 | 0.0013 |
Analysis: 2 Time: 50.6 N: 402.6 Event: 178.2 AHR: 0.65 Information fraction: 0.8 | |||||
Efficacy | 2.59 | 0.0048 | 0.6762 | 0.6082 | 0.0052 |
Analysis: 3 Time: 66 N: 402.6 Event: 222.8 AHR: 0.65 Information fraction: 1 | |||||
Efficacy | 2.30 | 0.0108 | 0.7330 | 0.8200 | 3 0.0125 |
1 One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
2 Approximate hazard ratio to cross bound. | |||||
3 Cumulative alpha for final analysis (0.0125) is less than the full alpha (0.025) when the futility bound is non-binding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.025 - 0.0125 = 0.0125) under the null hypothesis. |
Example using rpact
- PFS calculations:
library(rpact)
<- getDesignGroupSequential(
pfs_rpact_gsd sided = 1,
alpha = alphal,
informationRates = timing_pfs,
typeOfDesign = "asOF",
beta = 1 - power_pfs,
typeBetaSpending = "bsHSD",
gammaB = -10,
bindingFutility = FALSE
)
<- getSampleSizeSurvival(
pfs_rpact design = pfs_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_pfs,
lambda2 = log(2) / med_pfs,
hazardRatio = hr1_pfs,
dropoutRate1 = 0.2,
dropoutRate2 = 0.2,
dropoutTime = 12
)
kable(summary(pfs_rpact))
Warning in kable.ParameterSet(summary(pfs_rpact)): Manual use of kable() for
rpact result objects is no longer needed, as the formatting and display will be
handled automatically by the rpact package
Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 2 looks (group sequential design), one-sided overall significance level 1.25%, power 95%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.6, control lambda(2) = 0.074, accrual time = 24, accrual intensity = 16.5, follow-up time = 10, dropout rate(1) = 0.2, dropout rate(2) = 0.2, dropout time = 12.
Stage | 1 | 2 |
---|---|---|
Planned information rate | 75% | 100% |
Cumulative alpha spent | 0.0039 | 0.0125 |
Cumulative beta spent | 0.0041 | 0.0500 |
Stage levels (one-sided) | 0.0039 | 0.0113 |
Efficacy boundary (z-value scale) | 2.658 | 2.280 |
Futility boundary (z-value scale) | 0.743 | |
Efficacy boundary (t) | 0.670 | 0.742 |
Futility boundary (t) | 0.894 | |
Cumulative power | 0.7668 | 0.9500 |
Number of subjects | 396.9 | 396.9 |
Expected number of subjects under H1 | 396.9 | |
Cumulative number of events | 175.8 | 234.4 |
Expected number of events under H1 | 189.2 | |
Analysis time | 25.36 | 34.00 |
Expected study duration under H1 | 27.34 | |
Overall exit probability (under H0) | 0.7752 | |
Overall exit probability (under H1) | 0.7709 | |
Exit probability for efficacy (under H0) | 0.0039 | |
Exit probability for efficacy (under H1) | 0.7668 | |
Exit probability for futility (under H0) | 0.7713 | |
Exit probability for futility (under H1) | 0.0041 |
Legend:
- (t): treatment effect scale
Note: the dropoutRate1
, dropoutRate2
arguments in getSampleSizeSurvival()
refer to the % of drop-outs by the dropoutTime
, while the eta
argument in gsDesign::gsSurv()
and the dropout_rate
value in the fail_rate
argument in gsDesign2::gs_design_ahr()
refer to the annual drop-out rate parameter under the exponential distribution. In our example, if \(X\) is a drop-out time and \(X \sim \text{Exponential} (\lambda)\), we assume that by month 12 the drop-out rate was 20%, which implies: \(P(X\le12) = 1 - e^{-12\lambda} = 0.2 \Rightarrow \lambda = 0.019\). Due to the above differences, the value \(\lambda = 0.019\) was used in the gsDesign and gsDesign2 example, while 0.2 was used in the rpact example.
- OS calculations:
<- getDesignGroupSequential(
os_rpact_gsd sided = 1,
alpha = alphal,
informationRates = timing_os,
typeOfDesign = "asOF",
beta = 1 - power_os
)
<- getSampleSizeSurvival(
os_rpact design = os_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_os,
lambda2 = log(2) / med_os,
hazardRatio = hr1_os,
dropoutRate1 = 1 - exp(-do_rate_os * 12),
dropoutRate2 = 1 - exp(-do_rate_os * 12),
dropoutTime = 12
)
kable(summary(os_rpact))
Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 3 looks (group sequential design), one-sided overall significance level 1.25%, power 82%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.65, control lambda(2) = 0.019, accrual time = 24, accrual intensity = 16.5, follow-up time = 42, dropout rate(1) = 0.012, dropout rate(2) = 0.012, dropout time = 12.
Stage | 1 | 2 | 3 |
---|---|---|---|
Planned information rate | 60% | 80% | 100% |
Cumulative alpha spent | 0.0013 | 0.0052 | 0.0125 |
Stage levels (one-sided) | 0.0013 | 0.0048 | 0.0108 |
Efficacy boundary (z-value scale) | 3.020 | 2.587 | 2.296 |
Efficacy boundary (t) | 0.590 | 0.676 | 0.733 |
Cumulative power | 0.2899 | 0.6082 | 0.8200 |
Number of subjects | 395.1 | 395.1 | 395.1 |
Expected number of subjects under H1 | 395.1 | ||
Cumulative number of events | 131.2 | 174.9 | 218.6 |
Expected number of events under H1 | 179.4 | ||
Analysis time | 38.44 | 50.60 | 66.00 |
Expected study duration under H1 | 53.11 | ||
Exit probability for efficacy (under H0) | 0.0013 | 0.0040 | |
Exit probability for efficacy (under H1) | 0.2899 | 0.3182 |
Legend:
- (t): treatment effect scale