Confidence Intervals for Proportions in R

Introduction

The methods to use for calculating a confidence interval (CI) for a proportion depend on the type of proportion you have.

  • 1 sample proportion (1 proportion calculated from 1 group of subjects)

  • 2 sample proportions and you want a CI for the difference in the 2 proportions.

    • If the 2 samples come from 2 independent samples (different subjects in each of the 2 groups)

    • If the 2 samples are matched (i.e. the same subject has 2 results, one on each group [paired data]).

The method selected is also dependent on whether your proportion is close to 0 or 1 (or near to the 0.5 midpoint), and your sample size.

For more information about these methods, including which performs better in different scenarios see Five Confidence Intervals for Proportions That You Should Know about1.

Data used

The adcibc data stored here was used in this example, creating a binary treatment variable trt taking the values of ACT or PBO and a binary response variable resp taking the values of Yes or No. For this example, a response is defined as a score greater than 4.

The below shows that for the Actual Treatment, there are 36 responders out of 154 subjects = 0.234 (23.4% responders).

# A tibble: 4 × 3
# Groups:   trt [2]
  trt   resp      n
  <chr> <chr> <int>
1 ACT   No      118
2 ACT   Yes      36
3 PBO   No       65
4 PBO   Yes      12

Packages

The {cardx} package is an extension of the {cards} package, providing additional functions to create Analysis Results Data Objects (ARDs)1. It was developed as part of {NEST} and pharmaverse. This package requires the binary endpoint to be a logical (TRUE/FALSE) vector or a numeric/integer coded as (0, 1) with 1 (TRUE) being the success you want to calculate the confidence interval for.

See here for full description of the {cardx} proportions equations.

If calculating the CI for a difference in proportions, the package requires both the response and the treatment variable to be numeric/integer coded as (0, 1) (or logical vector).

Instead of the code presented below, you can use ard_categorical_ci(data, variables=resp, method ='wilson') for example. This invokes the code below but returns an analysis results dataset (ARD) format as the output. Methods included are waldcc, wald, clopper-pearson, wilson, wilsoncc, strat_wilson, strat_wilsoncc, agresti-coull and jeffreys for one-sample proportions and methods for 2 independent samples, however currently does not have a method for 2 matched proportions.

Code example: proportion_ci_clopper_pearson(<resp_var>,conf.level=0.95) %>% as_tibble()

Example data format needed for {cardx} for a single proportion CI

 [1] 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0

The {PropCIs} package produces CIs for methods such as Blaker’s exact method and Midp which aren’t available in {cardx} but are available in SAS. We found results agreed with SAS to the 5th decimal place. The package also calculates CIs for Clopper-Pearson, Wald, Wilson, Agresti-Coull and these align to results obtained in cardx to at least the 7th decimal place. The {PropsCIs} package requires just the number of events (numerator number of successes) & total number of subjects (denominator) as an input dataset. Given Blaker and Midp are rarely used in practice, and {PropsCIs} isn’t a package commonly downloaded from CRAN, further detail is not provided here.

Code example for Clopper-pearson:
exactci(x=<count of successes> , n=<Total>, conf.level=0.95)

Code example for Mid P method:
midPci(x=<count of successes> , n=<Total>, conf.level=0.95)

Code example for Blaker’s exact method:
blakerci(x=<count of successes> , n=<Total>, conf.level=0.95, tolerance=1e-05)

The {Hmisc} package produces CIs using the Clopper-Pearson method. In this example (x=36 and n=154), the results match the cardx package. Documentation reports that the method uses F distribution to compute exact intervals based on the binomial cdf. However, if the percentage of responders is 100% then the upper limit is set to 1. Similarly if the percentage of responders is 0%, then the lower limit is set to 0. Hence, in extreme cases there may be differences between this package and the standard implementation of Clopper-Pearson method.

Code example for Clopper-pearson:
binconf(x=<count of successes> , n=<Total>,method="exact",alpha=0.05)

The {RBesT} package produces CIs using the Clopper-Pearson method. In this example (x=36 and n=154), the results match the cardx package. However, as described below, there are 2 cases where the results using RBesT package do not match cardx or Hmisc.

  1. x = 0 (0% responders), in which case the lower limit does not match.
  2. x = n (100% responders), in which case the upper limit does not match.

Because of the relationship between the binomial distribution and the beta distribution. This package uses quantiles of the beta distribution to derive exact confidence intervals.

\[ B(\alpha/2;x, n-x+1) < p < B(1-\alpha/2; x+1, n-x)\]

RBesT equations are:
pLow <- qbeta(Low, r + (r == 0), n - r + 1)
pHigh <- qbeta(High, r + 1, n - r + ((n - r) == 0))

If the equations were updated as follows then it would match Hmisc intervals:
pLow <- qbeta(Low, r, n - r + 1)
pHigh <- qbeta(High, r + 1, n - r)

BinaryExactCI(x=<count of successes> , n=<Total>,alpha=0.05)

It is currently unclear why the RBesT script has the logical conditions (r==0) and ((n-r)==0. Therefore, we currently do not recommend using this package and suggest cardx or Hmisc is used instead. For updates about this, see the issue

The {ExactCIdiff} package produces CIs for two dependent proportions (matched pairs) and two independent proportions (unmatched pairs).

Methods for Calculating Confidence Intervals for a single proportion using cardx

For more technical derivation and reasons for use of each of the methods listed below, see the corresponding SAS page.

Let’s start by calculating a Confidence interval for the proportion of successes observed in the Active Treatment group (a single sample).

Clopper-Pearson (Exact or binomial CI) Method

Clopper-Pearson Exact CI is one of the most popular methods, it is often good for small sample sizes when the proportion is not close to the tails (0,1), but it can be too conservative (too wide an interval compared to the interval containing the true population proportion 95% of the time).

The cardx package calculates the Clopper-Pearson score by calling stats::binom.test() function.

proportion_ci_clopper_pearson(act2,conf.level=0.95) %>% 
  as_tibble()
# A tibble: 1 × 10
      N conf.level estimate statistic  p.value parameter conf.low conf.high
  <int>      <dbl>    <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl>
1   154       0.95    0.234        36 2.21e-11       154    0.169     0.309
# ℹ 2 more variables: method <chr>, alternative <chr>

Normal Approximation Method (Also known as the Wald or asymptotic CI Method)

In large random samples from independent trials, the sampling distribution of proportions approximately follows the normal distribution. The expectation of a sample proportion is the corresponding population proportion. Therefore, based on a sample of size \(n\), a \((1-\alpha)\%\) confidence interval for population proportion can be calculated using normal approximation as follows:

\(p\approx \hat p \pm z_\alpha \sqrt{\hat p(1-\hat p)}/{n}\), where \(\hat p\) is the sample proportion, \(z_\alpha\) is the \(1-\alpha/2\) quantile of a standard normal distribution corresponding to level \(\alpha\), and \(\sqrt{\hat p(1-\hat p)}/{n}\) is the standard error.

For more technical information see the corresponding SAS page.

Example code

The following code calculates a confidence interval for a binomial proportion using normal approximation equation manually. This is replicated exactly using the cardx::proportion_ci_wald function which also allows the continuity correction to be applied.

    # sample proportion by trt
summary <- adcibc %>% 
           filter(trt=="ACT") %>% 
           group_by(resp) %>% 
           tally()  %>% 
           ungroup() %>% 
           mutate(total=sum(n)) %>% 
           mutate(p=n/total)

    # Calculate standard error and 95% wald confidence intervals for population proportion
waldci <-summary %>% 
         filter(resp=="Yes") %>% 
         mutate(se=sqrt(p*(1-p)/total)) %>% 
         mutate(lower_ci=(p-qnorm(1-0.05/2)*se)) %>% 
         mutate(upper_ci=(p+qnorm(1-0.05/2)*se)) 
waldci  
# A tibble: 1 × 7
  resp      n total     p     se lower_ci upper_ci
  <chr> <int> <int> <dbl>  <dbl>    <dbl>    <dbl>
1 Yes      36   154 0.234 0.0341    0.167    0.301
#cardx package Wald method without continuity correction
proportion_ci_wald(act2,conf.level=0.95,correct=FALSE) %>% 
  as_tibble()
# A tibble: 1 × 6
      N estimate conf.low conf.high conf.level method                           
  <int>    <dbl>    <dbl>     <dbl>      <dbl> <glue>                           
1   154    0.234    0.167     0.301       0.95 Wald Confidence Interval without…
#cardx package Wald method with continuity correction
proportion_ci_wald(act2,conf.level=0.95,correct=TRUE) %>% 
  as_tibble()
# A tibble: 1 × 6
      N estimate conf.low conf.high conf.level method                           
  <int>    <dbl>    <dbl>     <dbl>      <dbl> <glue>                           
1   154    0.234    0.164     0.304       0.95 Wald Confidence Interval with co…

Wilson Method (Also known as the Score method or the Altman, Newcombe method3 )

The cardx package calculates the Wilson (score) method by calling stats::prop.test() function. This method is often used as a compromise between the Clopper-Pearson and the Wald given it was found to be accurate for most parameter values (even those close to 0 and 1), and it does not suffer from being over-conservative. For more technical information see the corresponding SAS page.

The package also contains a function for proportion_ci_strat_wilson() which calculates the stratified Wilson CIs for unequal proportions as described on page 47 here.

#cardx package Wilson method without continuity correction
proportion_ci_wilson(act2,conf.level=0.95,correct=FALSE) %>% 
  as_tibble()
# A tibble: 1 × 10
      N conf.level estimate statistic  p.value parameter conf.low conf.high
  <int>      <dbl>    <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl>
1   154       0.95    0.234      43.7 3.90e-11         1    0.174     0.307
# ℹ 2 more variables: method <glue>, alternative <chr>
#cardx package Wilson method with continuity correction
proportion_ci_wilson(act2,conf.level=0.95,correct=TRUE) %>% 
  as_tibble()
# A tibble: 1 × 10
      N conf.level estimate statistic  p.value parameter conf.low conf.high
  <int>      <dbl>    <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl>
1   154       0.95    0.234      42.6 6.70e-11         1    0.171     0.310
# ℹ 2 more variables: method <glue>, alternative <chr>

Agresti-Coull Method

The cardx package calculates the Agresti-Coull method using the equation from the published method by Alan Agresti & Brent Coull based on adding 2 successes and 2 failures before computing the wald CI. The CI is truncated, when it overshoots the boundary (<0 or >1).

#cardx package agresti_coull method 
proportion_ci_agresti_coull(act2,conf.level=0.95) %>% 
  as_tibble()
# A tibble: 1 × 6
      N estimate conf.low conf.high conf.level method                           
  <int>    <dbl>    <dbl>     <dbl>      <dbl> <chr>                            
1   154    0.234    0.174     0.307       0.95 Agresti-Coull Confidence Interval

Jeffreys Method

Jeffreys method is a particular type of Bayesian Highest Probability Density (HPD) Method. For proportions, the beta distribution is generally used for the prior, which consists of two parameters alpha and beta. Setting alpha=beta=0.5 is called Jeffrey’s prior. NOTE: if you want to use any other priors, you can use binom.bayes which estimates a credible interval for proportions.

#cardx package jeffreys method 
proportion_ci_jeffreys(act2,conf.level=0.95) %>% 
  as_tibble()
# A tibble: 1 × 6
      N estimate conf.low conf.high conf.level method           
  <int>    <dbl>    <dbl>     <dbl>      <dbl> <glue>           
1   154    0.234    0.172     0.305       0.95 Jeffreys Interval

Methods for Calculating Confidence Intervals for a matched pair proportion using {ExactCIdiff}

For more information about the detailed methods for calculating confidence intervals for a matched pair proportion see here. When you have 2 measurements on the same subject, the 2 sets of measures are not independent and you have matched pair of responses.

To date we have not found an R package which calculates a CI for matched pair proportions using the normal approximation or Wilson methods although they can be done by hand using the equations provided on the SAS page link above.

The {ExactCIdiff} package produces exact CIs for two dependent proportions (matched pairs), claiming to be the first package in R to do this method. However, it should only be used when the sample size is not too large as it can be computationally intensive.
NOTE that the {ExactNumCI} package should not be used for this task. More detail on these two packages can be found here.

Using a cross over study as our example, a 2 x 2 table can be formed as follows:

The proportions of subjects responding on each treatment are:
Placebo
Response= Yes
Placebo
Response = No
Total
Active Response = Yes r s r+s
Active Response = No t u t+u
Total r+t s+u N = r+s+t+u

Active: \(\hat p_1 = (r+s)/n\) and Placebo: \(\hat p_2= (r+t)/n\)

Difference between the proportions for each treatment are: \(D=p1-p2=(s-t)/n\)

Suppose :

Placebo
Response= Yes
Placebo
Response = No
Total
Active Response = Yes r = 20 s = 15 r+s = 35
Active Response = No t = 6 u = 5 t+u = 11
Total r+t = 26 s+u = 20 N = r+s+t+u = 46

Active: \(\hat p_1 = (r+s)/n\) =35/46 =0.761 and Placebo: \(\hat p_2= (r+t)/n\) = 26/46 =0.565

Difference = 0.761-0.565 = 0.196, then PairedCI() function can provide an exact confidence interval as shown below

(-0.00339 to 0.38065)

#ExactCIdiff: PairedCI(s, r+u, t, conf.level = 0.95)
CI<-PairedCI(15, 25, 6, conf.level = 0.95)$ExactCI
CI

Methods for Calculating Confidence Intervals for 2 independent samples proportion using {cardx}

This paper4 describes many methods for the calculation of confidence intervals for 2 independent proportions. The most commonly used are: Wald with continuity correction and the Wilson with continuity correction.

Normal Approximation Method (Also known as the Wald or asymptotic CI Method)

For more technical information see the corresponding SAS page.

Example code

cardx::ard_stats_prop_test function uses stats::prop.test which also allows a continuity correction to be applied.

Although this website here and this one here both reference Newcombe for the CI that this function uses, replication of the results by hand and compared to SAS show that the results below match the Normal Approximation (Wald method).

Both the Treatment variable (ACT,PBO) and the Response variable (Yes,No) have to be numeric (0,1) or Logit (TRUE,FALSE) variables.

The prop.test default with 2 groups, is the null hypothesis that the proportions in each group are the same and a 2-sided CI.

indat1<- adcibc2 %>% 
  select(AVAL,TRTP) %>% 
  mutate(resp=if_else(AVAL>4,"Yes","No")) %>% 
  mutate(respn=if_else(AVAL>4,1,0)) %>% 
  mutate(trt=if_else(TRTP=="Placebo","PBO","ACT"))%>% 
  mutate(trtn=if_else(TRTP=="Placebo",1,0)) %>% 
  select(trt,trtn,resp, respn) 

# cardx package required a vector with 0 and 1s for a single proportion CI
# To get the comparison the correct way around Placebo must be 1, and Active 0

indat<- select(indat1, trtn,respn)
cardx::ard_stats_prop_test(data=indat, by=trtn, variables=respn, conf.level = 0.95, correct=FALSE) 
{cards} data frame: 13 x 9
   group1 variable   context   stat_name stat_label      stat
1    trtn    respn stats_pr…    estimate  Rate Dif…     0.078
2    trtn    respn stats_pr…   estimate1  Group 1 …     0.234
3    trtn    respn stats_pr…   estimate2  Group 2 …     0.156
4    trtn    respn stats_pr…   statistic  X-square…     1.893
5    trtn    respn stats_pr…     p.value    p-value     0.169
6    trtn    respn stats_pr…   parameter  Degrees …         1
7    trtn    respn stats_pr…    conf.low  CI Lower…    -0.027
8    trtn    respn stats_pr…   conf.high  CI Upper…     0.183
9    trtn    respn stats_pr…      method     method 2-sample…
10   trtn    respn stats_pr… alternative  alternat… two.sided
11   trtn    respn stats_pr…           p          p          
12   trtn    respn stats_pr…  conf.level  CI Confi…      0.95
13   trtn    respn stats_pr…     correct  Yates' c…     FALSE
ℹ 3 more variables: fmt_fn, warning, error
cardx::ard_stats_prop_test(data=indat, by=trtn, variables=respn, conf.level = 0.95, correct=TRUE) 
{cards} data frame: 13 x 9
   group1 variable   context   stat_name stat_label      stat
1    trtn    respn stats_pr…    estimate  Rate Dif…     0.078
2    trtn    respn stats_pr…   estimate1  Group 1 …     0.234
3    trtn    respn stats_pr…   estimate2  Group 2 …     0.156
4    trtn    respn stats_pr…   statistic  X-square…      1.45
5    trtn    respn stats_pr…     p.value    p-value     0.229
6    trtn    respn stats_pr…   parameter  Degrees …         1
7    trtn    respn stats_pr…    conf.low  CI Lower…    -0.037
8    trtn    respn stats_pr…   conf.high  CI Upper…     0.193
9    trtn    respn stats_pr…      method     method 2-sample…
10   trtn    respn stats_pr… alternative  alternat… two.sided
11   trtn    respn stats_pr…           p          p          
12   trtn    respn stats_pr…  conf.level  CI Confi…      0.95
13   trtn    respn stats_pr…     correct  Yates' c…      TRUE
ℹ 3 more variables: fmt_fn, warning, error

Wilson Method (Also known as the Score method or the Altman, Newcombe method3 )

For more technical information see the corresponding SAS page.

We have not yet found a package in R which produces these confidence intervals for 2 independent proportions.

References

  1. pharmaverse cardx package
  2. PropCIs package
  3. D. Altman, D. Machin, T. Bryant, M. Gardner (eds). Statistics with Confidence: Confidence Intervals and Statistical Guidelines, 2nd edition. John Wiley and Sons 2000.
  4. https://www.lexjansen.com/wuss/2016/127_Final_Paper_PDF.pdf