import pandas as pd
from scipy.stats import skew, kurtosis
# Create sample data
= {
data 'team': ["A"]*5 + ["B"]*5 + ["C"]*5,
'points': [10, 17, 17, 18, 15, 10, 14, 13, 29, 25, 12, 30, 34, 12, 11],
'assists': [2, 5, 6, 3, 0, 2, 5, 4, 0, 2, 1, 1, 3, 4, 7]
}= pd.DataFrame(data) df
Skewness/Kurtosis
Skewness and Kurtosis in Python
Skewness measures the the amount of asymmetry in a distribution, while Kurtosis describes the “tailedness” of the curve. These measures are frequently used to assess the normality of the data. There are several methods to calculate these measures. In Python, the packages pandas, scipy.stats.skew and scipy.stats.kurtosis can be used.
Data Used
Skewness
Joanes and Gill (1998) discusses three methods for estimating skewness:
- Type 1: This is the typical definition used in many older textbooks
\[g_1 = m_1/m_2^{3/2}\]
Type 2: Used in SAS and SPSS
\[ G_1 = g_1\sqrt{n(n-1)}/(n-2) \]
Type 3: Used in MINITAB and BMDP
\[ b_1 = m_3/s^3 = g_1((n-1)/n)^{3/2} \]
All three skewness measures are unbiased under normality. The three methods are illustrated in the following code:
# Skewness
= skew(df['points'])
type1_skew = df['points'].skew()
type2_skew = skew(df['points']) * ((len(df['points']) - 1) / len(df['points'])) ** (3/2)
type3_skew
print(f"Skewness - Type 1: {type1_skew}")
print(f"Skewness - Type 2: {type2_skew}")
print(f"Skewness - Type 3: {type3_skew}")
Skewness - Type 1: 0.9054442043798532
Skewness - Type 2: 1.0093179298709385
Skewness - Type 3: 0.816426058828937
The default for the scipy.stats.skew function is type 1.
Kurtosis
Joanes and Gill (1998) discuss three methods for estimating kurtosis:
- Type 1: This is the typical definition used in many older textbooks
\[g_2 = m_4/m_2^{2}-3\]
Type 2: Used in SAS and SPSS
\[G_2 = ((n+1)g_2+6)*\frac{(n-1)}{(n-2)(n-3)}\]
Type 3: Used in MINITAB and BMDP
\[b_2 = m_4/s^4-3 = (g_2 + 3)(1-1/n)^2-3\]
Only \(G_2\) (corresponding to type 2) is unbiased under normality. The three methods are illustrated in the following code:
# Kurtosis
= kurtosis(df['points'])
type1_kurt
= len(df['points'])
n = kurtosis(df['points'], fisher=True) # Fisher's kurtosis
g2
# Calculate the kurtosis type using the formula G2
= ((n + 1) * g2 + 6) * ((n - 1) / ((n - 2) * (n - 3)))
type2_kurt
# Calculate the kurtosis type using the formula b2
= len(df['points'])
n = kurtosis(df['points'], fisher=True) # Fisher's kurtosis
g2
= (g2 + 3) * ((1 - 1/n) ** 2) - 3
type3_kurt
print(f"Kurtosis - Type 1: {type1_kurt}")
print(f"Kurtosis - Type 2: {type2_kurt}")
print(f"Kurtosis - Type 3: {type3_kurt}")
Kurtosis - Type 1: -0.5833410771247833
Kurtosis - Type 2: -0.2991564184355863
Kurtosis - Type 3: -0.8948215605175891
The default for the scipy.stats.kurtosis function is type 1.