Sample Size Calculation for Average Bioequivalence

Regulatory Requirements

The most unambiguous requirements are mentioned in FDA Guidance for Industry. Statistical Approaches to Establishing Bioequivalence:

Sample sizes for average BE should be obtained using published formulas. Sample sizes for population and individual BE should be based on simulated data. The simulations should be conducted using a default situation allowing the two formulations to vary as much as 5% in average BA with equal variances and certain magnitude of subject-by-formulation interaction. The study should have 80 or 90% power to conclude BE between these two formulations. Sample size also depends on the magnitude of variability and the design of the study. Variance estimates to determine the number of subjects for a specific drug can be obtained from the biomedical literature and/or pilot studies.

Appropriate method is described in Diletti D, Hauschke D, Steinijans VW. Sample Size Determination for Bioequivalence Assessment by Means of Confidence Intervals. Int J Clin Pharmacol Ther Toxicol. 1991;29(1):1–8 and implemented in R package PowerTOST with one clarification: it is simulation-based (iterative) procedure rather than simple calculation by formula.

#renv::install("PowerTOST")
library(PowerTOST)
library(knitr)
library(data.table)
library(purrr)


Attaching package: 'purrr'

The following object is masked from 'package:data.table':

    transpose

Sample size for standard crossover design (2x2x2) and 4 period full replicate design (2x2x4)

sampleN.TOST() function can calculate sample size for different designs:

kable(known.designs())

no	design	df	df2	steps	bk	bknif	bkni	name
0	parallel	n-2	n-2	2	4.0	1/1	1.0000000	2 parallel groups
1	2x2	n-2	n-2	2	2.0	1/2	0.5000000	2x2 crossover
1	2x2x2	n-2	n-2	2	2.0	1/2	0.5000000	2x2x2 crossover
2	3x3	2*n-4	n-3	3	2.0	2/9	0.2222222	3x3 crossover
3	3x6x3	2*n-4	n-6	6	2.0	1/18	0.0555556	3x6x3 crossover
4	4x4	3*n-6	n-4	4	2.0	1/8	0.1250000	4x4 crossover
5	2x2x3	2*n-3	n-2	2	1.5	3/8	0.3750000	2x2x3 replicate crossover
6	2x2x4	3*n-4	n-2	2	1.0	1/4	0.2500000	2x2x4 replicate crossover
7	2x4x4	3*n-4	n-4	4	1.0	1/16	0.0625000	2x4x4 replicate crossover
9	2x3x3	2*n-3	n-3	3	1.5	1/6	0.1666667	partial replicate (2x3x3)
10	2x4x2	n-2	n-2	4	8.0	1/2	0.5000000	Balaam’s (2x4x2)
11	2x2x2r	3*n-2	n-2	2	1.0	1/4	0.2500000	Liu’s 2x2x2 repeated x-over
100	paired	n-1	n-1	1	2.0	2/1	2.0000000	paired means

Basic usage: we should specify targetpower (power to achieve at least, e.g. 0.8 or 0.9), theta0 (T/R ratio if logscale = TRUE which is convenient default value) and cv (coefficient of variation given as ratio if logscale = TRUE).

# 2x2x2
sampleN.TOST(
  targetpower = 0.8, 
  theta0 = 0.95, 
  CV = 0.3, 
  design = "2x2x2"
)


+++++++++++ Equivalence test - TOST +++++++++++
            Sample size estimation
-----------------------------------------------
Study design: 2x2 crossover 
log-transformed data (multiplicative model)

alpha = 0.05, target power = 0.8
BE margins = 0.8 ... 1.25 
True ratio = 0.95,  CV = 0.3

Sample size (total)
 n     power
40   0.815845

# 2x2x4
sampleN.TOST(
  targetpower = 0.9, 
  theta0 = 0.98, 
  CV = 0.24, 
  design = "2x2x4"
)


+++++++++++ Equivalence test - TOST +++++++++++
            Sample size estimation
-----------------------------------------------
Study design: 2x2x4 (4 period full replicate) 
log-transformed data (multiplicative model)

alpha = 0.05, target power = 0.9
BE margins = 0.8 ... 1.25 
True ratio = 0.98,  CV = 0.24

Sample size (total)
 n     power
14   0.917492

Note that total (not per-sequence) sample size is given.

alpha (one-sided significance level, default is 0.05) almost never needs to be changed, theta1 (lower bioequivalence limit) and theta2 (upper bioequivalence limit) can be changed for non-standard bioequivalence limits, e.g. for narrow therapeutic index drugs.

Reproduction of Table 1 from FDA Guidance for Industry. Statistical Approaches to Establishing Bioequivalence

Reproduction of Table 1 from FDA Guidance for Industry. Statistical Approaches to Establishing Bioequivalence is quite tricky because it consists one more parameter to consider - the subject-by-formulation interaction variance component, \(\sigma_D^2\).

\[\sigma_D^2=(\sigma_{BT}-\sigma_{BR})^2+2\times(1-\rho)\times\sigma_{BT}\times\sigma_{BR}\] where \(\sigma_{BT}^2\) and \(\sigma_{BR}^2\) are between-subject variances for the T and R formulations, respectively and \(\rho\) is correlation between subject-specific means \(\mu_{Tj}\) and \(\mu_{Rj}\). These parameters are rarely reported in publications and can’t be estimated from CI boundaries and sample size. In such lack of information one can assume \(\sigma_{BT}=\sigma_{BR}\) as well as \(\rho=1\). Under these reasonable assumptions \(\sigma_D^2=\sigma_D=0\), so sampleN.TOST() calculation should be correct.

targetpower <- c(0.8, 0.9)
theta0 <- 1 - 0.05
CV <- c(0.15, 0.23, 0.3, 0.5)
design <- c("2x2x2", "2x2x4")

dt <- CJ(CV, targetpower, design, theta0)

sample_size <- purrr::pmap(dt, sampleN.TOST, print = FALSE)
kable(rbindlist(sample_size))

Design	alpha	CV	theta0	theta1	theta2	Sample size	Achieved power	Target power
2x2x2	0.05	0.15	0.95	0.8	1.25	12	0.8305164	0.8
2x2x4	0.05	0.15	0.95	0.8	1.25	6	0.8458307	0.8
2x2x2	0.05	0.15	0.95	0.8	1.25	16	0.9260211	0.9
2x2x4	0.05	0.15	0.95	0.8	1.25	8	0.9328881	0.9
2x2x2	0.05	0.23	0.95	0.8	1.25	24	0.8066535	0.8
2x2x4	0.05	0.23	0.95	0.8	1.25	12	0.8143816	0.8
2x2x2	0.05	0.23	0.95	0.8	1.25	32	0.9044320	0.9
2x2x4	0.05	0.23	0.95	0.8	1.25	16	0.9082552	0.9
2x2x2	0.05	0.30	0.95	0.8	1.25	40	0.8158453	0.8
2x2x4	0.05	0.30	0.95	0.8	1.25	20	0.8202398	0.8
2x2x2	0.05	0.30	0.95	0.8	1.25	52	0.9019652	0.9
2x2x4	0.05	0.30	0.95	0.8	1.25	26	0.9043064	0.9
2x2x2	0.05	0.50	0.95	0.8	1.25	98	0.8032172	0.8
2x2x4	0.05	0.50	0.95	0.8	1.25	50	0.8128063	0.8
2x2x2	0.05	0.50	0.95	0.8	1.25	132	0.9012316	0.9
2x2x4	0.05	0.50	0.95	0.8	1.25	66	0.9021398	0.9

As we can see, calculated values are equal to the reference ones for smallest \(\sigma_D=0.01\) if CV=0.15 and CV=0.23. If CV=0.30 and power 80%, sample sizes are also equal, but for other parameters combinations sample sizes are underestimated.

Conclusion: we can trust sampleN.TOST(); for CV less or equal 0.30 with power 80% and for CV less or equal 0.23 with power 90% it can be considered as validated against reference from FDA guidance.

Estimate CV from CI boundaries and sample size

CV can be calculated from CI boundaries and sample size if only these values are available:

CVfromCI(lower = 0.95, upper = 1.11, n = 38)

[1] 0.2029806

Session Info

sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] purrr_1.0.2       data.table_1.16.0 knitr_1.50        PowerTOST_1.5-6  

loaded via a namespace (and not attached):
 [1] digest_0.6.37     cubature_2.1.1    fastmap_1.2.0     xfun_0.52        
 [5] magrittr_2.0.3    htmltools_0.5.8.1 rmarkdown_2.28    lifecycle_1.0.4  
 [9] mvtnorm_1.3-1     cli_3.6.3         vctrs_0.6.5       renv_1.0.10      
[13] compiler_4.4.2    tools_4.4.2       evaluate_1.0.0    Rcpp_1.0.13      
[17] yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9    htmlwidgets_1.6.4