ANOVA

Introduction

Analysis of VAriance (ANOVA) is a statistical test to measure the difference between means of more than two groups.It is best suited when the data is normally distributed. By partitioning total variance into components, ANOVA unravels relationship between variables and identifies the true source of variation. It can handle multiple factors and their interactions, providing a robust way to better understand intricate relationships.

Anova Test in Python

To perform a one-way ANOVA test in Python we can use the f_oneway() function from SciPy library. Similarly, to perform two-way ANOVA test anova_lm() function from the statsmodel library is frequently used.

For this test, we’ll create a data frame called df_disease taken from the SAS documentation. The corresponding data can be found here. In this experiment, we are trying to find the impact of different drug and disease group on the stem-length

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Read the sample data
df = pd.read_csv("../data/sas_disease.csv")


#perform two-way ANOVA
model = ols('y ~ C(drug) + C(disease) + C(drug):C(disease)', data=df).fit()
sm.stats.anova_lm(model, typ=2)
sum_sq df F PR(>F)
C(drug) 3063.432863 3.0 9.245096 0.000067
C(disease) 418.833741 2.0 1.895990 0.161720
C(drug):C(disease) 707.266259 6.0 1.067225 0.395846
Residual 5080.816667 46.0 NaN NaN