Binomial Test

The statistical test used to determine whether the proportion in a binary outcome experiment is equal to a specific value. It is appropriate when we have a small sample size and want to test the success probability \(p\) against a hypothesized value \(p_0\).

Creating a sample dataset

We will generate a dataset where we record the outcomes of 1000 coin flips.
We will use the binom.test function to test if the proportion of heads is significantly different from 0.5.

import numpy as np
from scipy.stats import binomtest

# Set seed for reproducibility
np.random.seed(19)
coin_flips = np.random.choice(['H', 'T'], size=1000, replace=True, p=[0.5, 0.5])

Now, we will count the heads and tails and summarize the data.

# Count heads and tails
heads_count = np.sum(coin_flips == 'H')
tails_count = np.sum(coin_flips == 'T')
total_flips = len(coin_flips)

heads_count, tails_count, total_flips

(np.int64(523), np.int64(477), 1000)

Conducting Binomial Test

# Perform the binomial test
binom_test_result = binomtest(heads_count, total_flips, p=0.5)
binom_test_result

BinomTestResult(k=523, n=1000, alternative='two-sided', statistic=0.523, pvalue=0.15469370647995673)

Results:

The output has a p-value py binom_test_result \(> 0.05\) (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the coin is fair.

Example of Clinical Trial Data

We load the lung dataset from survival package. We want to test if the proportion of patients with survival status 1 (dead) is significantly different from a hypothesized proportion (e.g. 50%)

We will calculate number of deaths and total number of patients.

import pandas as pd

# Load the lung cancer dataset from CSV file
lung = pd.read_csv('../data/lung_cancer.csv')

# Calculate the number of deaths and total number of patients
num_deaths = np.sum(lung['status'] == 1)
total_pat = lung.shape[0]

num_deaths, total_pat

(np.int64(63), 228)

Conduct the Binomial Test

We will conduct the Binomial test and hypothesize that the proportion of death should be 19%.

# Perform the binomial test
binom_test_clinical = binomtest(num_deaths, total_pat, p=0.19)
binom_test_clinical

BinomTestResult(k=63, n=228, alternative='two-sided', statistic=0.27631578947368424, pvalue=0.0016828878642599632)

Results:

The output has a p-value py binom_test_clinical \(< 0.05\) (chosen level of significance). Hence, we reject the null hypothesis and conclude that the propotion of death is significantly different from 19%.