import numpy as np
from scipy.stats import binomtest
# Set seed for reproducibility
19)
np.random.seed(= np.random.choice(['H', 'T'], size=1000, replace=True, p=[0.5, 0.5]) coin_flips
Binomial Test
The statistical test used to determine whether the proportion in a binary outcome experiment is equal to a specific value. It is appropriate when we have a small sample size and want to test the success probability \(p\) against a hypothesized value \(p_0\).
Creating a sample dataset
We will generate a dataset where we record the outcomes of 1000 coin flips.
We will use the
binom.test
function to test if the proportion of heads is significantly different from 0.5.
Now, we will count the heads and tails and summarize the data.
# Count heads and tails
= np.sum(coin_flips == 'H')
heads_count = np.sum(coin_flips == 'T')
tails_count = len(coin_flips)
total_flips
heads_count, tails_count, total_flips
(np.int64(523), np.int64(477), 1000)
Conducting Binomial Test
# Perform the binomial test
= binomtest(heads_count, total_flips, p=0.5)
binom_test_result binom_test_result
BinomTestResult(k=523, n=1000, alternative='two-sided', statistic=0.523, pvalue=0.15469370647995673)
Results:
The output has a p-value py binom_test_result
\(> 0.05\) (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the coin is fair.
Example of Clinical Trial Data
We load the lung
dataset from survival
package. We want to test if the proportion of patients with survival status 1 (dead) is significantly different from a hypothesized proportion (e.g. 50%)
We will calculate number of deaths and total number of patients.
import pandas as pd
# Load the lung cancer dataset from CSV file
= pd.read_csv('../data/lung_cancer.csv')
lung
# Calculate the number of deaths and total number of patients
= np.sum(lung['status'] == 1)
num_deaths = lung.shape[0]
total_pat
num_deaths, total_pat
(np.int64(63), 228)
Conduct the Binomial Test
We will conduct the Binomial test and hypothesize that the proportion of death should be 19%.
# Perform the binomial test
= binomtest(num_deaths, total_pat, p=0.19)
binom_test_clinical binom_test_clinical
BinomTestResult(k=63, n=228, alternative='two-sided', statistic=0.27631578947368424, pvalue=0.0016828878642599632)
Results:
The output has a p-value py binom_test_clinical
\(< 0.05\) (chosen level of significance). Hence, we reject the null hypothesis and conclude that the propotion of death is significantly different from 19%.