```
import pandas as pd
from scipy.stats import skew, kurtosis
# Create sample data
= {
data 'team': ["A"]*5 + ["B"]*5 + ["C"]*5,
'points': [10, 17, 17, 18, 15, 10, 14, 13, 29, 25, 12, 30, 34, 12, 11],
'assists': [2, 5, 6, 3, 0, 2, 5, 4, 0, 2, 1, 1, 3, 4, 7]
}= pd.DataFrame(data) df
```

# Skewness/Kurtosis

**Skewness and Kurtosis in Python**

Skewness measures the the amount of asymmetry in a distribution, while Kurtosis describes the “tailedness” of the curve. These measures are frequently used to assess the normality of the data. There are several methods to calculate these measures. In Python, the packages **pandas**, **scipy.stats.skew** and **scipy.stats.kurtosis** can be used.

## Data Used

#### Skewness

Joanes and Gill (1998) discusses three methods for estimating skewness:

- Type 1: This is the typical definition used in many older textbooks

\[g_1 = m_1/m_2^{3/2}\]

Type 2: Used in SAS and SPSS

\[ G_1 = g_1\sqrt{n(n-1)}/(n-2) \]

Type 3: Used in MINITAB and BMDP

\[ b_1 = m_3/s^3 = g_1((n-1)/n)^{3/2} \]

All three skewness measures are unbiased under normality. The three methods are illustrated in the following code:

```
# Skewness
= skew(df['points'])
type1_skew = df['points'].skew()
type2_skew = skew(df['points']) * ((len(df['points']) - 1) / len(df['points'])) ** (3/2)
type3_skew
print(f"Skewness - Type 1: {type1_skew}")
print(f"Skewness - Type 2: {type2_skew}")
print(f"Skewness - Type 3: {type3_skew}")
```

```
Skewness - Type 1: 0.9054442043798532
Skewness - Type 2: 1.0093179298709385
Skewness - Type 3: 0.816426058828937
```

The default for the **scipy.stats.skew** function is type 1.

#### Kurtosis

Joanes and Gill (1998) discuss three methods for estimating kurtosis:

- Type 1: This is the typical definition used in many older textbooks

\[g_2 = m_4/m_2^{2}-3\]

Type 2: Used in SAS and SPSS

\[G_2 = ((n+1)g_2+6)*\frac{(n-1)}{(n-2)(n-3)}\]

Type 3: Used in MINITAB and BMDP

\[b_2 = m_4/s^4-3 = (g_2 + 3)(1-1/n)^2-3\]

Only \(G_2\) (corresponding to type 2) is unbiased under normality. The three methods are illustrated in the following code:

```
# Kurtosis
= kurtosis(df['points'])
type1_kurt
= len(df['points'])
n = kurtosis(df['points'], fisher=True) # Fisher's kurtosis
g2
# Calculate the kurtosis type using the formula G2
= ((n + 1) * g2 + 6) * ((n - 1) / ((n - 2) * (n - 3)))
type2_kurt
# Calculate the kurtosis type using the formula b2
= len(df['points'])
n = kurtosis(df['points'], fisher=True) # Fisher's kurtosis
g2
= (g2 + 3) * ((1 - 1/n) ** 2) - 3
type3_kurt
print(f"Kurtosis - Type 1: {type1_kurt}")
print(f"Kurtosis - Type 2: {type2_kurt}")
print(f"Kurtosis - Type 3: {type3_kurt}")
```

```
Kurtosis - Type 1: -0.5833410771247833
Kurtosis - Type 2: -0.2991564184355863
Kurtosis - Type 3: -0.8948215605175891
```

The default for the **scipy.stats.kurtosis** function is type 1.