Skewness/Kurtosis

Skewness and Kurtosis in Python

Skewness measures the the amount of asymmetry in a distribution, while Kurtosis describes the “tailedness” of the curve. These measures are frequently used to assess the normality of the data. There are several methods to calculate these measures. In Python, the packages pandas, scipy.stats.skew and scipy.stats.kurtosis can be used.

Data Used

import pandas as pd
from scipy.stats import skew, kurtosis

# Create sample data
data = {
    'team': ["A"]*5 + ["B"]*5 + ["C"]*5,
    'points': [10, 17, 17, 18, 15, 10, 14, 13, 29, 25, 12, 30, 34, 12, 11],
    'assists': [2, 5, 6, 3, 0, 2, 5, 4, 0, 2, 1, 1, 3, 4, 7]
}
df = pd.DataFrame(data)

Skewness

Joanes and Gill (1998) discusses three methods for estimating skewness:

  • Type 1: This is the typical definition used in many older textbooks

\[g_1 = m_1/m_2^{3/2}\]

  • Type 2: Used in SAS and SPSS

    \[ G_1 = g_1\sqrt{n(n-1)}/(n-2) \]

  • Type 3: Used in MINITAB and BMDP

    \[ b_1 = m_3/s^3 = g_1((n-1)/n)^{3/2} \]

All three skewness measures are unbiased under normality. The three methods are illustrated in the following code:

# Skewness
type1_skew = skew(df['points'])
type2_skew = df['points'].skew()
type3_skew = skew(df['points']) * ((len(df['points']) - 1) / len(df['points'])) ** (3/2)

print(f"Skewness - Type 1: {type1_skew}")
print(f"Skewness - Type 2: {type2_skew}")
print(f"Skewness - Type 3: {type3_skew}")
Skewness - Type 1: 0.9054442043798532
Skewness - Type 2: 1.0093179298709385
Skewness - Type 3: 0.816426058828937

The default for the scipy.stats.skew function is type 1.

Kurtosis

Joanes and Gill (1998) discuss three methods for estimating kurtosis:

  • Type 1: This is the typical definition used in many older textbooks

\[g_2 = m_4/m_2^{2}-3\]

  • Type 2: Used in SAS and SPSS

    \[G_2 = ((n+1)g_2+6)*\frac{(n-1)}{(n-2)(n-3)}\]

  • Type 3: Used in MINITAB and BMDP

    \[b_2 = m_4/s^4-3 = (g_2 + 3)(1-1/n)^2-3\]

Only \(G_2\) (corresponding to type 2) is unbiased under normality. The three methods are illustrated in the following code:

# Kurtosis
type1_kurt = kurtosis(df['points'])

n = len(df['points'])
g2 = kurtosis(df['points'], fisher=True)  # Fisher's kurtosis

# Calculate the kurtosis type using the formula G2
type2_kurt = ((n + 1) * g2 + 6) * ((n - 1) / ((n - 2) * (n - 3)))

# Calculate the kurtosis type using the formula b2
n = len(df['points'])
g2 = kurtosis(df['points'], fisher=True)  # Fisher's kurtosis

type3_kurt = (g2 + 3) * ((1 - 1/n) ** 2) - 3

print(f"Kurtosis - Type 1: {type1_kurt}")
print(f"Kurtosis - Type 2: {type2_kurt}")
print(f"Kurtosis - Type 3: {type3_kurt}")
Kurtosis - Type 1: -0.5833410771247833
Kurtosis - Type 2: -0.2991564184355863
Kurtosis - Type 3: -0.8948215605175891

The default for the scipy.stats.kurtosis function is type 1.