Sample Size Calculation for Average Bioequivalence

Regulatory Requirements

The most unambiguous requirements are mentioned in FDA Guidance for Industry. Statistical Approaches to Establishing Bioequivalence:

Sample sizes for average BE should be obtained using published formulas. Sample sizes for population and individual BE should be based on simulated data. The simulations should be conducted using a default situation allowing the two formulations to vary as much as 5% in average BA with equal variances and certain magnitude of subject-by-formulation interaction. The study should have 80 or 90% power to conclude BE between these two formulations. Sample size also depends on the magnitude of variability and the design of the study. Variance estimates to determine the number of subjects for a specific drug can be obtained from the biomedical literature and/or pilot studies.

Appropriate method is described in Diletti D, Hauschke D, Steinijans VW. Sample Size Determination for Bioequivalence Assessment by Means of Confidence Intervals. Int J Clin Pharmacol Ther Toxicol. 1991;29(1):1–8 and implemented in R package PowerTOST with one clarification: it is simulation-based (iterative) procedure rather than simple calculation by formula.

#renv::install("PowerTOST")
library(PowerTOST)
library(knitr)
library(data.table)
library(purrr)

Attaching package: 'purrr'
The following object is masked from 'package:data.table':

    transpose

Sample size for standard crossover design (2x2x2) and 4 period full replicate design (2x2x4)

sampleN.TOST() function can calculate sample size for different designs:

kable(known.designs())
no design df df2 steps bk bknif bkni name
0 parallel n-2 n-2 2 4.0 1/1 1.0000000 2 parallel groups
1 2x2 n-2 n-2 2 2.0 1/2 0.5000000 2x2 crossover
1 2x2x2 n-2 n-2 2 2.0 1/2 0.5000000 2x2x2 crossover
2 3x3 2*n-4 n-3 3 2.0 2/9 0.2222222 3x3 crossover
3 3x6x3 2*n-4 n-6 6 2.0 1/18 0.0555556 3x6x3 crossover
4 4x4 3*n-6 n-4 4 2.0 1/8 0.1250000 4x4 crossover
5 2x2x3 2*n-3 n-2 2 1.5 3/8 0.3750000 2x2x3 replicate crossover
6 2x2x4 3*n-4 n-2 2 1.0 1/4 0.2500000 2x2x4 replicate crossover
7 2x4x4 3*n-4 n-4 4 1.0 1/16 0.0625000 2x4x4 replicate crossover
9 2x3x3 2*n-3 n-3 3 1.5 1/6 0.1666667 partial replicate (2x3x3)
10 2x4x2 n-2 n-2 4 8.0 1/2 0.5000000 Balaam’s (2x4x2)
11 2x2x2r 3*n-2 n-2 2 1.0 1/4 0.2500000 Liu’s 2x2x2 repeated x-over
100 paired n-1 n-1 1 2.0 2/1 2.0000000 paired means

Basic usage: we should specify targetpower (power to achieve at least, e.g. 0.8 or 0.9), theta0 (T/R ratio if logscale = TRUE which is convenient default value) and cv (coefficient of variation given as ratio if logscale = TRUE).

# 2x2x2
sampleN.TOST(
  targetpower = 0.8, 
  theta0 = 0.95, 
  CV = 0.3, 
  design = "2x2x2"
)

+++++++++++ Equivalence test - TOST +++++++++++
            Sample size estimation
-----------------------------------------------
Study design: 2x2 crossover 
log-transformed data (multiplicative model)

alpha = 0.05, target power = 0.8
BE margins = 0.8 ... 1.25 
True ratio = 0.95,  CV = 0.3

Sample size (total)
 n     power
40   0.815845 
# 2x2x4
sampleN.TOST(
  targetpower = 0.9, 
  theta0 = 0.98, 
  CV = 0.24, 
  design = "2x2x4"
)

+++++++++++ Equivalence test - TOST +++++++++++
            Sample size estimation
-----------------------------------------------
Study design: 2x2x4 (4 period full replicate) 
log-transformed data (multiplicative model)

alpha = 0.05, target power = 0.9
BE margins = 0.8 ... 1.25 
True ratio = 0.98,  CV = 0.24

Sample size (total)
 n     power
14   0.917492 

Note that total (not per-sequence) sample size is given.

alpha (one-sided significance level, default is 0.05) almost never needs to be changed, theta1 (lower bioequivalence limit) and theta2 (upper bioequivalence limit) can be changed for non-standard bioequivalence limits, e.g. for narrow therapeutic index drugs.

Reproduction of Table 1 from FDA Guidance for Industry. Statistical Approaches to Establishing Bioequivalence

Reproduction of Table 1 from FDA Guidance for Industry. Statistical Approaches to Establishing Bioequivalence is quite tricky because it consists one more parameter to consider - the subject-by-formulation interaction variance component, \(\sigma_D^2\).

\[\sigma_D^2=(\sigma_{BT}-\sigma_{BR})^2+2\times(1-\rho)\times\sigma_{BT}\times\sigma_{BR}\] where \(\sigma_{BT}^2\) and \(\sigma_{BR}^2\) are between-subject variances for the T and R formulations, respectively and \(\rho\) is correlation between subject-specific means \(\mu_{Tj}\) and \(\mu_{Rj}\). These parameters are rarely reported in publications and can’t be estimated from CI boundaries and sample size. In such lack of information one can assume \(\sigma_{BT}=\sigma_{BR}\) as well as \(\rho=1\). Under these reasonable assumptions \(\sigma_D^2=\sigma_D=0\), so sampleN.TOST() calculation should be correct.

targetpower <- c(0.8, 0.9)
theta0 <- 1 - 0.05
CV <- c(0.15, 0.23, 0.3, 0.5)
design <- c("2x2x2", "2x2x4")

dt <- CJ(CV, targetpower, design, theta0)

sample_size <- purrr::pmap(dt, sampleN.TOST, print = FALSE)
kable(rbindlist(sample_size))
Design alpha CV theta0 theta1 theta2 Sample size Achieved power Target power
2x2x2 0.05 0.15 0.95 0.8 1.25 12 0.8305164 0.8
2x2x4 0.05 0.15 0.95 0.8 1.25 6 0.8458307 0.8
2x2x2 0.05 0.15 0.95 0.8 1.25 16 0.9260211 0.9
2x2x4 0.05 0.15 0.95 0.8 1.25 8 0.9328881 0.9
2x2x2 0.05 0.23 0.95 0.8 1.25 24 0.8066535 0.8
2x2x4 0.05 0.23 0.95 0.8 1.25 12 0.8143816 0.8
2x2x2 0.05 0.23 0.95 0.8 1.25 32 0.9044320 0.9
2x2x4 0.05 0.23 0.95 0.8 1.25 16 0.9082552 0.9
2x2x2 0.05 0.30 0.95 0.8 1.25 40 0.8158453 0.8
2x2x4 0.05 0.30 0.95 0.8 1.25 20 0.8202398 0.8
2x2x2 0.05 0.30 0.95 0.8 1.25 52 0.9019652 0.9
2x2x4 0.05 0.30 0.95 0.8 1.25 26 0.9043064 0.9
2x2x2 0.05 0.50 0.95 0.8 1.25 98 0.8032172 0.8
2x2x4 0.05 0.50 0.95 0.8 1.25 50 0.8128063 0.8
2x2x2 0.05 0.50 0.95 0.8 1.25 132 0.9012316 0.9
2x2x4 0.05 0.50 0.95 0.8 1.25 66 0.9021398 0.9

As we can see, calculated values are equal to the reference ones for smallest \(\sigma_D=0.01\) if CV=0.15 and CV=0.23. If CV=0.30 and power 80%, sample sizes are also equal, but for other parameters combinations sample sizes are underestimated.

Conclusion: we can trust sampleN.TOST(); for CV less or equal 0.30 with power 80% and for CV less or equal 0.23 with power 90% it can be considered as validated against reference from FDA guidance.

Estimate CV from CI boundaries and sample size

CV can be calculated from CI boundaries and sample size if only these values are available:

CVfromCI(lower = 0.95, upper = 1.11, n = 38)
[1] 0.2029806
sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] purrr_1.0.2       data.table_1.16.0 knitr_1.48        PowerTOST_1.5-6  

loaded via a namespace (and not attached):
 [1] digest_0.6.37     cubature_2.1.1    fastmap_1.2.0     xfun_0.48        
 [5] magrittr_2.0.3    htmltools_0.5.8.1 rmarkdown_2.28    lifecycle_1.0.4  
 [9] mvtnorm_1.3-1     cli_3.6.3         vctrs_0.6.5       renv_1.0.10      
[13] compiler_4.4.3    tools_4.4.3       evaluate_1.0.0    Rcpp_1.0.13      
[17] yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9    htmlwidgets_1.6.4