Binomial Test

The statistical test used to determine whether the proportion in a binary outcome experiment is equal to a specific value. It is appropriate when we have a small sample size and want to test the success probability \(p\) against a hypothesized value \(p_0\).

Coin flips dataset.

  • We will use coin flips dataset generated from SAS simulation to carry out four binomial tests (Exact test, Wald test, Mid-p adjusted test and Wilson score test). This is to ensure that the proportion value obtained from coin flips dataset is similar for both software rather than simulating in both instances, which leads to different proportion value.

  • We will use the various functions for each test to investigate if the proportion of heads is significantly different from 0.5. Therefore:

\(H_0 : p = 0.5\)

# heads
heads_count <- 520
heads_count
[1] 520
# tails
tails_count <- 480
tails_count
[1] 480
# total
total_flips <- 1000
total_flips
[1] 1000

1. Exact Binomial Test.

binom.test(heads_count, total_flips, p = 0.5, conf.level = 0.95)

    Exact binomial test

data:  heads_count and total_flips
number of successes = 520, number of trials = 1000, p-value = 0.2174
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4885149 0.5513671
sample estimates:
probability of success 
                  0.52 

Results:

The output has a p-value \(> 0.05\) (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the coin is fair.

2. Wald(Asymptotic) Test.

library(DescTools)
p=0.5
phat<-heads_count/total_flips
phat
[1] 0.52
z <- (phat - p) / sqrt(p * (1 - p) / total_flips)
2 * (1 - pnorm(abs(z)))
[1] 0.2059032
BinomCI(heads_count, total_flips, method = "wald")
      est    lwr.ci    upr.ci
[1,] 0.52 0.4890351 0.5509649

Results:

The output has a p-value \(> 0.05\) (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the coin is fair.

3. Mid-P adjusted Exact Binomial Test.

library(exactci)
Loading required package: ssanv
Loading required package: testthat
binom.exact(heads_count,total_flips, p = 0.5,alternative = "greater", midp = TRUE,tsmethod = "central")

    Exact one-sided binomial test, mid-p version

data:  heads_count and total_flips
number of successes = 520, number of trials = 1000, p-value = 0.1031
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
 0.4939862 1.0000000
sample estimates:
probability of success 
                  0.52 
binom.exact(heads_count,total_flips, p = 0.5,alternative = "less", midp = TRUE,tsmethod = "central")

    Exact one-sided binomial test, mid-p version

data:  heads_count and total_flips
number of successes = 520, number of trials = 1000, p-value = 0.8969
alternative hypothesis: true probability of success is less than 0.5
95 percent confidence interval:
 0.0000000 0.5459277
sample estimates:
probability of success 
                  0.52 
binom.exact(heads_count,total_flips, p = 0.5, midp = TRUE,tsmethod = "central")

    Exact two-sided binomial test (central method), mid-p version

data:  heads_count and total_flips
number of successes = 520, number of trials = 1000, p-value = 0.2061
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4890192 0.5508727
sample estimates:
probability of success 
                  0.52 

Results:

The output has a p-value \(> 0.05\) (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the coin is fair.

3. Wilson Score Test.

prop.test(heads_count, total_flips, p = 0.5, correct = FALSE)

    1-sample proportions test without continuity correction

data:  heads_count out of total_flips, null probability 0.5
X-squared = 1.6, df = 1, p-value = 0.2059
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.4890177 0.5508292
sample estimates:
   p 
0.52 

Results:

The output has a p-value \(> 0.05\) (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the coin is fair.

Example of Clinical Trial Data.

We load the lung dataset from survival package. We want to test if the proportion of patients with survival status 1 (dead) is significantly different from a hypothesized proportion (e.g. 50%)

\(H_0 : p = 0.19\)

We will calculate number of deaths and total number of patients.

library(survival)
attach(lung)

# deaths
num_deaths <- sum(lung$status == 1)
num_deaths
[1] 63
# total patients
total_pat <- nrow(lung)
total_pat
[1] 228

1. Exact Binomial Test.

# Exact (Clopper–Pearson)
binom.test(num_deaths, total_pat, p = 0.19, conf.level = 0.95)

    Exact binomial test

data:  num_deaths and total_pat
number of successes = 63, number of trials = 228, p-value = 0.001683
alternative hypothesis: true probability of success is not equal to 0.19
95 percent confidence interval:
 0.2193322 0.3392187
sample estimates:
probability of success 
             0.2763158 

Results:

The output has a p-value \(< 0.05\) (chosen level of significance). Hence, we reject the null hypothesis and conclude that the proportion of death is significantly different from 19%.

2. Wald(Asymptotic) Test

library(DescTools)
p=0.19
phat<-num_deaths/total_pat
z <- (phat - p) / sqrt(p * (1 - p) / total_pat)
 2 * (1 - pnorm(abs(z)))
[1] 0.0008927984
BinomCI(num_deaths, total_pat, method = "wald")
           est    lwr.ci    upr.ci
[1,] 0.2763158 0.2182717 0.3343599

Results:

The output has a p-value \(< 0.05\) (chosen level of significance). Hence, we reject the null hypothesis and conclude that the proportion of death is significantly different from 19%.

3. Mid-P adjusted Exact Binomial Test.

library(exactci)
binom.exact(num_deaths, total_pat, p = 0.19, midp = TRUE,tsmethod = "central")

    Exact two-sided binomial test (central method), mid-p version

data:  num_deaths and total_pat
number of successes = 63, number of trials = 228, p-value = 0.001528
alternative hypothesis: true probability of success is not equal to 0.19
95 percent confidence interval:
 0.2212055 0.3370776
sample estimates:
probability of success 
             0.2763158 
binom.exact(num_deaths, total_pat, p = 0.19,alternative = "less", midp = TRUE,tsmethod = "central")

    Exact one-sided binomial test, mid-p version

data:  num_deaths and total_pat
number of successes = 63, number of trials = 228, p-value = 0.9992
alternative hypothesis: true probability of success is less than 0.19
95 percent confidence interval:
 0.00000 0.32708
sample estimates:
probability of success 
             0.2763158 
binom.exact(num_deaths, total_pat, p = 0.19,alternative = "greater", midp = TRUE,tsmethod = "central")

    Exact one-sided binomial test, mid-p version

data:  num_deaths and total_pat
number of successes = 63, number of trials = 228, p-value = 0.000764
alternative hypothesis: true probability of success is greater than 0.19
95 percent confidence interval:
 0.2297195 1.0000000
sample estimates:
probability of success 
             0.2763158 

Results:

The output for right sided and two sided test has a p-value \(< 0.05\) (chosen level of significance). Hence, we reject the null hypothesis and conclude that the proportion of death is significantly different from 19%.

4. Wilson Score Test

prop.test(num_deaths, total_pat, p = 0.19, correct = FALSE)

    1-sample proportions test without continuity correction

data:  num_deaths out of total_pat, null probability 0.19
X-squared = 11.038, df = 1, p-value = 0.0008928
alternative hypothesis: true p is not equal to 0.19
95 percent confidence interval:
 0.2223417 0.3377025
sample estimates:
        p 
0.2763158 

Results:

The output has a p-value \(< 0.05\) (chosen level of significance). Hence, we reject the null hypothesis and conclude that the proportion of death is significantly different from 19%.

Summary:

Data Test P_Value
Coin Flips Exact Test 0.2174
Wald Test 0.2059
Mid-p adjusted Exact Test 0.2061
Wilson score Test 0.2059
Clinical Trial Exact Test 0.0017
Wald Test 0.0009
Mid-p adjusted Exact Test 0.0015
Wilson Score Test 0.0009

For the two datasets, the results for Wald and Wilson Score test match. This implies that the sample data are adequate because Wald and Wilson differ mainly when sample size is small or probability of success is close to [0,1]. In that case Wilson score test will have better coverage.

More detailed information around CIs for proportions can be found here