# A tibble: 4 × 3
# Groups: trt [2]
trt resp n
<chr> <chr> <int>
1 ACT No 118
2 ACT Yes 36
3 PBO No 65
4 PBO Yes 12
Confidence Intervals for a Proportion in R
Introduction
See the summary page for general introductory information on confidence intervals for proportions, including the principles underlying the most common methods, and a summary of R packages available.
[Note: information about cicalc package will be added to this page soon.]
Data used
The adcibc data stored here was used in this example, creating a binary treatment variable trt taking the values of ACT or PBO and a binary response variable resp taking the values of Yes or No. For this example, a response is defined as a score greater than 4.
The below shows that for the Active Treatment, there are 36 responders out of 154 subjects = 0.234 (23.4% responders).
Packages
The table below indicates which methods can be produced using each package. Methods are grouped by those that aim to achieve the nominal confidence interval on average, then the ‘exact’ and continuity adjusted methods that aim to achieve the nominal confidence level as a minimum.
| cardx | PropCIs | ratesci | DescTools | cicalc | Hmisc | RBesT | |
|---|---|---|---|---|---|---|---|
| For proximate coverage: | |||||||
| Wald | Y | Y | Y | Y | Y | Y | - |
| Agresti-Coull | Y | Y | Y | Y | Y | - | - |
| Jeffreys | Y | - | Y | Y | Y | - | - |
| Mid-P | - | Y | Y | Y | Y | - | - |
| Wilson | Y | Y | Y | Y | Y | Y | - |
| SCAS | - | - | Y | - | - | - | - |
| For conservative coverage: | |||||||
| Wald-cc | Y | - | Y | Y | Y | - | - |
| Jeffreys-cc | - | - | Y | - | - | - | - |
| Clopper-Pearson | Y | Y | Y | Y | Y | Y | Y |
| Wilson-cc | Y | - | Y | - | Y | - | - |
| SCAS-cc | - | - | Y | - | - | - | - |
| Blaker | - | Y | Y | Y | - | - | - |
Most of these packages require just the number of events (numerator number of successes) & total number of subjects (denominator) as an input dataset. The exception is {cardx}, which takes input data as one row per individual.
The {cardx} package includes a separate function for each method. Alternatively, you can use cardx::ard_categorical_ci(data = act, variables = respn, method ='wilson') for example. This invokes the code for the selected method but returns an analysis results dataset (ARD) format as the output.
The {PropCIs} package contains separate functions to produce CIs for various methods, including Blaker’s ‘exact’ method and Mid-P which aren’t available in {cardx} but are produced by SAS PROC FREQ. We found results agreed with SAS to the 5th decimal place. For methods given by both {PropCIs} and {cardx}, the results generally aligned to at least the 7th decimal place, but some methods (Wilson & mid-P) are only output with 4 dps.
The {ratesci} package (rateci() function) produces a variety of CIs for a proportion - note, the current development version at https://github.com/petelaud/ratesci has added more methods compared to the CRAN release (v1.0.0). A CRAN update will be released in due course.
The {DescTools} package has a function BinomCI() that produces a wide selection of methods. It has
The {cicalc} package … [TBC]
In the {Hmisc} package, binconf() produces CIs using either the Clopper-Pearson, Wilson or Wald method. In this example (x=36 and n=154), the results match the cardx package.
The {RBesT} package produces CIs using the Clopper-Pearson method with the BinaryExactCI() function. (Prior to version 1.8-0, the function produced erroneous results for boundary cases where x = 0 or x = N).
Another package producing many of the above intervals is {contingencytables}, but the format of its output objects is not very easy to work with.
Methods for Calculating Confidence Intervals for a single proportion using R
For more technical derivation and characteristics of each of the methods listed below, see the corresponding SAS page.
Most of the methods are demonstrated below using the {cardx} package.
Clopper-Pearson (Exact or binomial CI) Method
The Clopper-Pearson ‘Exact’ CI is one of the most popular methods - it is often a preferred ‘safe option’ to avoid the known deficiencies of the Wald method and guarantee coverage of at least the nominal confidence level, but it can be excessively conservative (too wide an interval compared to the interval containing the true population proportion 95% of the time).
The {cardx} package calculates the Clopper-Pearson interval by calling the stats::binom.test() function.
cardx::proportion_ci_clopper_pearson(act2, conf.level = 0.95) |>
as_tibble() |>
select(-(statistic:parameter))# A tibble: 1 × 8
N n conf.level estimate conf.low conf.high method alternative
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 154 36 0.95 0.234 0.169 0.309 Clopper-Pearso… two.sided
Normal Approximation Method (Also known as the Wald Method)
The following code calculates a confidence interval for a binomial proportion using the normal approximation equation manually. This is replicated exactly using the cardx::proportion_ci_wald function which also allows the continuity correction to be applied.
# sample proportion by trt
summary <- adcibc |>
filter(trt == "ACT") |>
group_by(resp) |>
tally() |>
ungroup() |>
mutate(
total = sum(n),
p = n / total
)
# Calculate standard error and 95% wald confidence intervals for population proportion
waldci <- summary |>
filter(resp == "Yes") |>
mutate(
se = sqrt(p * (1 - p) / total),
lower_ci = (p - qnorm(1 - 0.05 / 2) * se),
upper_ci = (p + qnorm(1 - 0.05 / 2) * se)
)
waldci# A tibble: 1 × 7
resp n total p se lower_ci upper_ci
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 Yes 36 154 0.234 0.0341 0.167 0.301
# cardx package Wald method without continuity correction
cardx::proportion_ci_wald(act2, conf.level = 0.95, correct = FALSE) |>
as_tibble()# A tibble: 1 × 7
N n estimate conf.low conf.high conf.level method
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <glue>
1 154 36 0.234 0.167 0.301 0.95 Wald Confidence Interval w…
Wilson Method (Also known as the (Asymptotic) Score method)
The cardx package calculates the Wilson (score) method by calling the stats::prop.test() function. This method is often used as a compromise between the Clopper-Pearson and the Wald given it was found to be accurate for most parameter values (even those close to 0 and 1), and it does not suffer from being over-conservative. However, it over-corrects the asymmetric one-sided coverage of Wald - the location of the Wilson interval is shifted too far towards 0.51. For more technical information see the corresponding SAS page.
# cardx package Wilson method without continuity correction
cardx::proportion_ci_wilson(act2, conf.level = 0.95, correct = FALSE) |>
as_tibble() |>
select(-(statistic:parameter))# A tibble: 1 × 8
N n conf.level estimate conf.low conf.high method alternative
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <glue> <chr>
1 154 36 0.95 0.234 0.174 0.307 Wilson Confide… two.sided
A recent improvement to the Wilson Score method introduces a skewness correction in order to address the asymmetric one-sided coverage probabilities2. The resulting Skewness Corrected Asymptotic Score (SCAS) method can be obtained from the {ratesci} package. Note in this example the interval is shifted downwards by about 0.0015.
ratesci::rateci(x = 36, n = 154)$scas |>
as_tibble()# A tibble: 1 × 5
lower est upper x n
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.172 0.234 0.305 36 154
Agresti-Coull Method
The {cardx} package calculates the Agresti-Coull method using the equation from the published method by Alan Agresti & Brent Coull based on adding 2 successes and 2 failures before computing the wald CI. The CI is truncated, when it overshoots the boundary (<0 or >1).
# cardx package agresti_coull method
cardx::proportion_ci_agresti_coull(act2, conf.level = 0.95) |>
as_tibble()# A tibble: 1 × 7
N n estimate conf.low conf.high conf.level method
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 154 36 0.234 0.174 0.307 0.95 Agresti-Coull Confidence I…
Jeffreys Method
The Jeffreys ‘equal-tailed’ method is a particular type of Bayesian method, which optimises central location instead of Highest Probability Density (HPD)3. For proportions, the beta distribution is generally used for the prior, which consists of two parameters alpha and beta. Setting alpha=beta=0.5 is called the Jeffreys prior. NOTE: if you want to use any other priors you can use the ratesci:jeffreysci() function which allows a different beta prior to be specified using the ai and bi arguments. Alternatively (and if you prefer to opt for an HPD approach not recommended by Brown et al.), you can use binom::binom.bayes.
# cardx package jeffreys method
cardx::proportion_ci_jeffreys(act2, conf.level = 0.95) |>
as_tibble()# A tibble: 1 × 7
N n estimate conf.low conf.high conf.level method
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <glue>
1 154 36 0.234 0.172 0.305 0.95 Jeffreys Interval
Binomial based mid-P Method
The table below (from {ratesci}) includes two versions of the mid-P interval, one derived iteratively from binomial tail proportions and another from quantiles of a beta distribution. These have been claimed to be equivalent3, but it seems they are not quite identical - the beta version is very close to the Jeffreys interval. The first version closely matches with the result (0.17200131, 0.30545883) from SAS PROC FREQ:
ratesci::rateci(x = 36, n = 154, precis = 10)$ciarray[,,1] |>
as_tibble(rownames = "Method")# A tibble: 0 × 0
The mid-P result from {DescTools} doesn’t quite match with either of the above results, beyond the 4th decimal place, even with increased precision using the tol argument (which according to the documentation only affects the Blaker method). The documentation indicates that the mid-P interval is derived from the F-distribution, which suggests there are actually 3 slightly different versions of ‘mid-P’.
DescTools::BinomCI(x = 36, n = 154,
tol = 1e-10,
method = c("jeffreys", "wilson", "midp", "clopper-pearson")) |>
as_tibble(rownames = "Method")# A tibble: 4 × 4
Method est lwr.ci upr.ci
<chr> <dbl> <dbl> <dbl>
1 jeffreys 0.234 0.172 0.305
2 wilson 0.234 0.174 0.307
3 midp 0.234 0.172 0.305
4 clopper-pearson 0.234 0.169 0.309
The PropCIs version only displays 4dps, because the calculations are performed on sequential values of p at intervals of 0.0005. Therefore the 4th decimal place shown is only ever 1 or 6, and the PropCIs output for this method is only reliable up to the 3rd decimal place. (A similar precision issue is also noted for the exactci::binom.exact(..., midp = TRUE) function, which is probably due to reliance on uniroot() with its default value for the tol argument.)
PropCIs::midPci(x = 36, n = 154, conf.level = 0.95)
data:
95 percent confidence interval:
0.1721 0.3051
Blaker Method
The Blaker ‘exact’ CI matching SAS output (to at least 7 dps) may be obtained as follows from {ratesci}, or from the dedicated {BlakerCI} package. Output from DescTools::BinomCI and PropCIs::blakerci() is less precise, matching to 4 decimal places.
options(digits=8)
# ratesci package Blaker method
ratesci::rateci(36, 154, cc = TRUE)$blaker |>
as_tibble()# A tibble: 1 × 5
lower est upper x n
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.170 0.234 0.307 36 154
Continuity Adjusted Methods
Continuity adjustments (at a magnitude of \(\gamma / N\) where conventionally \(\gamma = 0.5\) but smaller values may be used) can be applied to the Wald, Wilson and SCAS formulae.
Here is an example of the adjusted Wilson interval from {cardx}:
# cardx package Wilson method with continuity correction
cardx::proportion_ci_wilson(act2, conf.level = 0.95, correct = TRUE) |>
as_tibble()# A tibble: 1 × 11
N n conf.level estimate statistic p.value parameter conf.low
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 154 36 0.95 0.234 42.6 6.70e-11 1 0.171
# ℹ 3 more variables: conf.high <dbl>, method <glue>, alternative <chr>
In principle, the concept can also be applied to the Jeffreys CI, although with \(\gamma=0.5\) this turns out to be identical to the Clopper-Pearson interval.
ratesci::rateci(x = 36, n = 154, cc = 0.5)$ciarray |>
drop() |>
as_tibble(rownames="Method")# A tibble: 0 × 0
Consistency with hypothesis tests
The consistency of CIs for a proportion with a hypothesis test depends on what test is specified. Some confidence interval methods are derived by inverting a test, and therefore guarantee consistency. This includes Clopper Pearson (exact binomial test from prop.test()), Wilson Score (asymptotic binomial test from binom.test() using the standard error estimated under the null hypothesis), and mid-P (exact test with mid-P adjustment, e.g. from exactci::binom.exact(..., midp = TRUE)), although given the observations above, it would be important to select the right version of the mid-P interval. For the SCAS method, the CI is consistent with a skewness-corrected version of the asymptotic test (output by the ratesci::scoreci() function).
The output from some of the {cardx} functions (e.g. proportion_ci_wilson()) and proportion_ci_clopperpearson() includes a p-value, which is not mentioned in the package documentation. It is a test against a null hypothesis of p=0.5, which is the default value used by the underlying binom.test and prop.test functions. If a test against a specified value of p was required, then those underlying functions should be used to obtain the p-value, as there is no facility to select a different null value in the {cardx} functions.
Stratified Score methods
[To be moved to separate page for stratified analysis?]
{cardx} also contains a function proportion_ci_strat_wilson() which calculates stratified Wilson CIs for unequal proportions4, with or without continuity adjustment, using a combination of the CIs from each stratum. The default weights (not clearly documented) are presumably those described in the publication, which aim to minimise the weighted sum of the squared interval lengths. The function has the facility to specify other weights (such as proportional to stratum size, for example, or adjusted for known population weights). Examples shown use the adcibc2 dataset, stratified by age group.
actstrat <- adcibc2 |>
select(TRTP, AVAL, AGEGR1) |>
filter(TRTP != "Placebo") |>
mutate(
respn = if_else(AVAL > 4, 1, 0),
agegrp = factor(AGEGR1)
) |>
select(respn, agegrp)
actage <- actstrat$agegrp
table(actage, act2) |>
as_tibble()# A tibble: 6 × 3
actage act2 n
<chr> <chr> <int>
1 <65 0 13
2 >80 0 32
3 65-80 0 73
4 <65 1 4
5 >80 1 11
6 65-80 1 21
table(actage) |>
as_tibble()# A tibble: 3 × 2
actage n
<chr> <int>
1 <65 17
2 >80 43
3 65-80 94
cardx::proportion_ci_strat_wilson(x = act2, strata = actage) |>
as_tibble()# A tibble: 3 × 8
N n estimate conf.low conf.high conf.level weights method
<int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <glue>
1 154 36 0.234 0.174 0.307 0.95 0.113 Stratified Wilson …
2 154 36 0.234 0.174 0.307 0.95 0.263 Stratified Wilson …
3 154 36 0.234 0.174 0.307 0.95 0.623 Stratified Wilson …
cardx::proportion_ci_strat_wilson(x = act2, strata = actage, weights = c(1/3, 1/3, 1/3)) |>
as_tibble()# A tibble: 1 × 7
N n estimate conf.low conf.high conf.level method
<int> <int> <dbl> <dbl> <dbl> <dbl> <glue>
1 154 36 0.234 0.166 0.332 0.95 Stratified Wilson Confiden…
The {ratesci} package can also produce stratified Wilson or SCAS intervals (with or without continuity adjustment), though the underlying methodology is different - this version uses a stratified score function2 (analogous to the Miettinen-Nurminen formula for stratified risk difference) instead of constructing the interval from a CI calculated separately for each stratum. For stratified analysis, the ratesci::scoreci() function takes inputs in the form of vectors of the numerators and denominators per stratum. By default, weights are proportional to stratum size, but custom weights are also catered for using the wt argument. Arbitrary fixed weights are shown here to allow comparison with the {cardx} version.
ratesci::scoreci(x1 = c(4, 21, 11),
n1 = c(17, 94, 43),
contrast = 'p',
skew = FALSE,
stratified = TRUE)$estimates |>
as_tibble()# A tibble: 1 × 6
lower est upper level p1hat p1mle
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.174 0.234 0.307 0.95 0.234 0.234
ratesci::scoreci(x1 = c(4, 21, 11),
n1 = c(17, 94, 43),
wt = c(1,1,1),
contrast = 'p',
skew = FALSE,
stratified = TRUE)$estimates |>
as_tibble()# A tibble: 1 × 6
lower est upper level p1hat p1mle
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.164 0.238 0.332 0.95 0.238 0.238