# PFS HR = 0.6
< 0.6
hr1_pfs # Median PFS of 9.4 months in the control arm
< 9.4
med_pfs # Median followup of 10 months for PFS
< 10
minfu_pfs # Monthly dropout of 0.019 for PFS
< 0.019
do_rate_pfs # IA timing for PFS is at 75% information fraction
< c(0.75, 1)
timing_pfs # Power of 95% for PFS
< 0.95
power_pfs
# OS HR = 0.65
< 0.65
hr1_os # Median OS of 3 years in the control arm
< 12 * 3
med_os # Median followup of 42 months for OS
< 42
minfu_os # Monthly dropout of 0.001 for OS
< 0.001
do_rate_os # IA timing for OS is at 60% and 80% information fraction
< c(0.6, 0.8, 1)
timing_os # Power of 82% for OS
< 0.82
power_os
# Enrollment period of 24 months
< 24
enroll_dur # 1:1 randomization ratio
< 1
rand_ratio # alpha level of 1.25% for each endpoint
< 0.0125 alphal
Group sequential design in R
Group sequential design: timetoevent endpoint
While a group sequential design (GSD) could be applied for different types of endpoints, here we focus on timetoevent endpoints.
Available R packages
The commonly used R packages for power and sample size calculations utilizing a GSD are: gsDesign (also has a web interface), gsDesign2, and rpact.
Design assumptions
Using a toy example, we will assume that a primary objective of a phase III oncology trial is to compare a new therapy to a control in terms of progressionfree survival (PFS) and overall survival (OS). Note that, in this example, we have a family of primary endpoints, i.e., if at least one of the endpoints is successful, the study will be declared a success. A GSD will be utilized for each endpoint. PFS will be tested at one interim analysis (IA) for both efficacy and nonbinding futility, while OS will be tested at two IAs for efficacy only. An O’BrienFleming spending function will be used for efficacy testing and a HwangShihDecani spending function with \(\gamma = 10\) will be used for futility.
Further design assumptions are as follows:
We assume that given the above assumptions, we need to calculate the target number of events for each analysis as well as the total sample size.
Example code
Example using gsDesign
 PFS calculations:
library(gsDesign)
< gsSurv(
pfs_gsDesign k = length(timing_pfs),
timing = timing_pfs,
R = enroll_dur,
eta = do_rate_pfs,
minfup = minfu_pfs,
T = enroll_dur + minfu_pfs,
lambdaC = log(2) / med_pfs,
hr = hr1_pfs,
beta = 1  power_pfs,
alpha = alphal,
sfu = sfLDOF,
sfl = sfHSD,
sflpar = 10,
test.type = 4
)
> gsBoundSummary() pfs_gsDesign
Analysis Value Efficacy Futility
IA 1: 75% Z 2.6584 0.7432
N: 398 p (1sided) 0.0039 0.2287
Events: 176 ~HR at bound 0.6693 0.8938
Month: 25 P(Cross) if HR=1 0.0039 0.7713
P(Cross) if HR=0.6 0.7668 0.0041
Final Z 2.2801 2.2801
N: 398 p (1sided) 0.0113 0.0113
Events: 234 ~HR at bound 0.7421 0.7421
Month: 34 P(Cross) if HR=1 0.0125 0.9875
P(Cross) if HR=0.6 0.9500 0.0500
 OS calculations:
< gsSurv(
os_gsDesign k = length(timing_os),
timing = timing_os,
R = enroll_dur,
eta = do_rate_os,
minfup = minfu_os,
T = enroll_dur + minfu_os,
lambdaC = log(2) / med_os,
hr = hr1_os,
beta = 1  power_os,
alpha = alphal,
sfu = sfLDOF,
test.type = 1
)
> gsBoundSummary() os_gsDesign
Analysis Value Efficacy
IA 1: 60% Z 3.0205
N: 394 p (1sided) 0.0013
Events: 131 ~HR at bound 0.5896
Month: 38 P(Cross) if HR=1 0.0013
P(Cross) if HR=0.65 0.2899
IA 2: 80% Z 2.5874
N: 394 p (1sided) 0.0048
Events: 175 ~HR at bound 0.6758
Month: 51 P(Cross) if HR=1 0.0052
P(Cross) if HR=0.65 0.6082
Final Z 2.2958
N: 394 p (1sided) 0.0108
Events: 218 ~HR at bound 0.7327
Month: 66 P(Cross) if HR=1 0.0125
P(Cross) if HR=0.65 0.8200
Example using gsDesign2
 PFS calculations:
library(gsDesign2)
library(tibble)
< tibble(
enroll_rate stratum = "All",
duration = enroll_dur,
rate = 1
)< tibble(
fail_rate_pfs stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_pfs,
hr = hr1_pfs,
dropout_rate = do_rate_pfs
)
< gs_design_ahr(
pfs_gsDesign2 enroll_rate = enroll_rate,
fail_rate = fail_rate_pfs,
ratio = rand_ratio,
beta = 1  power_pfs,
alpha = alphal,
info_frac = timing_pfs,
analysis_time = enroll_dur + minfu_pfs,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),lower = gs_spending_bound,
lpar = list(
sf = gsDesign::sfHSD,
total_spend = 1  power_pfs,
param = 10
),info_scale = "h0_info"
)
>
pfs_gsDesign2 summary() >
as_gt()
Bound summary for AHR design  

AHR approximations of ~HR at bound  
Bound  Z  Nominal p^{1}  ~HR at bound^{2} 
Cumulative boundary crossing probability


Alternate hypothesis  Null hypothesis  
Analysis: 1 Time: 25.3 N: 405.8 Event: 179.2 AHR: 0.6 Information fraction: 0.75  
Futility  0.74  0.2287  0.8940  0.0041  0.7713 
Efficacy  2.66  0.0039  0.6697  0.7668  0.0039 
Analysis: 2 Time: 34 N: 405.8 Event: 238.9 AHR: 0.6 Information fraction: 1  
Futility  2.28  0.0113  0.7424  0.0500  0.9875 
Efficacy  2.28  0.0113  0.7424  0.9500  ^{3} 0.0125 
^{1} Onesided pvalue for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control.  
^{2} Approximate hazard ratio to cross bound.  
^{3} Cumulative alpha for final analysis (0.0125) is less than the full alpha (0.025) when the futility bound is nonbinding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.025  0.0125 = 0.0125) under the null hypothesis. 
 OS calculations:
< tibble(
fail_rate_os stratum = "All",
duration = Inf, # Can be set to `Inf` when proportional hazard is assumed
fail_rate = log(2) / med_os,
hr = hr1_os,
dropout_rate = do_rate_os
)
< gs_design_ahr(
os_gsDesign2 enroll_rate = pfs_gsDesign2$enroll_rate,
fail_rate = fail_rate_os,
ratio = rand_ratio,
beta = 1  power_os,
alpha = alphal,
info_frac = timing_os,
analysis_time = enroll_dur + minfu_os,
test_lower = FALSE,
upper = gs_spending_bound,
upar = list(
sf = gsDesign::sfLDOF,
total_spend = alphal
),info_scale = "h0_info"
)
>
os_gsDesign2 summary() >
as_gt()
Bound summary for AHR design  

AHR approximations of ~HR at bound  
Bound  Z  Nominal p^{1}  ~HR at bound^{2} 
Cumulative boundary crossing probability


Alternate hypothesis  Null hypothesis  
Analysis: 1 Time: 38.4 N: 402.6 Event: 133.7 AHR: 0.65 Information fraction: 0.6  
Efficacy  3.02  0.0013  0.5901  0.2899  0.0013 
Analysis: 2 Time: 50.6 N: 402.6 Event: 178.2 AHR: 0.65 Information fraction: 0.8  
Efficacy  2.59  0.0048  0.6762  0.6082  0.0052 
Analysis: 3 Time: 66 N: 402.6 Event: 222.8 AHR: 0.65 Information fraction: 1  
Efficacy  2.30  0.0108  0.7330  0.8200  ^{3} 0.0125 
^{1} Onesided pvalue for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control.  
^{2} Approximate hazard ratio to cross bound.  
^{3} Cumulative alpha for final analysis (0.0125) is less than the full alpha (0.025) when the futility bound is nonbinding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.025  0.0125 = 0.0125) under the null hypothesis. 
Example using rpact
 PFS calculations:
library(rpact)
< getDesignGroupSequential(
pfs_rpact_gsd sided = 1,
alpha = alphal,
informationRates = timing_pfs,
typeOfDesign = "asOF",
beta = 1  power_pfs,
typeBetaSpending = "bsHSD",
gammaB = 10,
bindingFutility = FALSE
)
< getSampleSizeSurvival(
pfs_rpact design = pfs_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_pfs,
lambda2 = log(2) / med_pfs,
hazardRatio = hr1_pfs,
dropoutRate1 = 0.2,
dropoutRate2 = 0.2,
dropoutTime = 12
)
kable(summary(pfs_rpact))
Warning in kable.ParameterSet(summary(pfs_rpact)): Manual use of kable() for
rpact result objects is no longer needed, as the formatting and display will be
handled automatically by the rpact package
Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 2 looks (group sequential design), onesided overall significance level 1.25%, power 95%. The results were calculated for a twosample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.6, control lambda(2) = 0.074, accrual time = 24, accrual intensity = 16.5, followup time = 10, dropout rate(1) = 0.2, dropout rate(2) = 0.2, dropout time = 12.
Stage  1  2 

Planned information rate  75%  100% 
Cumulative alpha spent  0.0039  0.0125 
Cumulative beta spent  0.0041  0.0500 
Stage levels (onesided)  0.0039  0.0113 
Efficacy boundary (zvalue scale)  2.658  2.280 
Futility boundary (zvalue scale)  0.743  
Efficacy boundary (t)  0.670  0.742 
Futility boundary (t)  0.894  
Cumulative power  0.7668  0.9500 
Number of subjects  396.9  396.9 
Expected number of subjects under H1  396.9  
Cumulative number of events  175.8  234.4 
Expected number of events under H1  189.2  
Analysis time  25.36  34.00 
Expected study duration under H1  27.34  
Overall exit probability (under H0)  0.7752  
Overall exit probability (under H1)  0.7709  
Exit probability for efficacy (under H0)  0.0039  
Exit probability for efficacy (under H1)  0.7668  
Exit probability for futility (under H0)  0.7713  
Exit probability for futility (under H1)  0.0041 
Legend:
 (t): treatment effect scale
Note: the dropoutRate1
, dropoutRate2
arguments in getSampleSizeSurvival()
refer to the % of dropouts by the dropoutTime
, while the eta
argument in gsDesign::gsSurv()
and the dropout_rate
value in the fail_rate
argument in gsDesign2::gs_design_ahr()
refer to the annual dropout rate parameter under the exponential distribution. In our example, if \(X\) is a dropout time and \(X \sim \text{Exponential} (\lambda)\), we assume that by month 12 the dropout rate was 20%, which implies: \(P(X\le12) = 1  e^{12\lambda} = 0.2 \Rightarrow \lambda = 0.019\). Due to the above differences, the value \(\lambda = 0.019\) was used in the gsDesign and gsDesign2 example, while 0.2 was used in the rpact example.
 OS calculations:
< getDesignGroupSequential(
os_rpact_gsd sided = 1,
alpha = alphal,
informationRates = timing_os,
typeOfDesign = "asOF",
beta = 1  power_os
)
< getSampleSizeSurvival(
os_rpact design = os_rpact_gsd,
accrualTime = enroll_dur,
followUpTime = minfu_os,
lambda2 = log(2) / med_os,
hazardRatio = hr1_os,
dropoutRate1 = 1  exp(do_rate_os * 12),
dropoutRate2 = 1  exp(do_rate_os * 12),
dropoutTime = 12
)
kable(summary(os_rpact))
Sample size calculation for a survival endpoint
Sequential analysis with a maximum of 3 looks (group sequential design), onesided overall significance level 1.25%, power 82%. The results were calculated for a twosample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.65, control lambda(2) = 0.019, accrual time = 24, accrual intensity = 16.5, followup time = 42, dropout rate(1) = 0.012, dropout rate(2) = 0.012, dropout time = 12.
Stage  1  2  3 

Planned information rate  60%  80%  100% 
Cumulative alpha spent  0.0013  0.0052  0.0125 
Stage levels (onesided)  0.0013  0.0048  0.0108 
Efficacy boundary (zvalue scale)  3.020  2.587  2.296 
Efficacy boundary (t)  0.590  0.676  0.733 
Cumulative power  0.2899  0.6082  0.8200 
Number of subjects  395.1  395.1  395.1 
Expected number of subjects under H1  395.1  
Cumulative number of events  131.2  174.9  218.6 
Expected number of events under H1  179.4  
Analysis time  38.44  50.60  66.00 
Expected study duration under H1  53.11  
Exit probability for efficacy (under H0)  0.0013  0.0040  
Exit probability for efficacy (under H1)  0.2899  0.3182 
Legend:
 (t): treatment effect scale