```
=lung pearson;
proc corr data
var age mealcal; run;
```

# Correlation Analysis using SAS

**Example: Lung Cancer Data**

*Data source:* *Loprinzi CL. Laurie JA. Wieand HS. Krook JE. Novotny PJ. Kugler JW. Bartel J. Law M. Bateman M. Klatt NE. et al. Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology. 12(3):601-7, 1994.*

Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.

## Overview

The `CORR`

procedure computes Pearson correlation coefficients, three nonparametric measures of association, and the probabilities associated with these statistics. The correlation statistics include the following:

Pearson product-moment correlation

Spearman rank-order correlation

Kendall’s tau-b coefficient

Hoeffding’s measure of dependence,

Pearson, Spearman, and Kendall partial correlation

This program works on the first three correlation coefficients.

**Missing Values**

PROC CORR excludes observations with missing values in the WEIGHT and FREQ variables. By default, PROC CORR uses ** pairwise deletion** when observations contain missing values. PROC CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations. Therefore, the correlation statistics might be based on different numbers of observations.

If you specify the NOMISS option, PROC CORR uses ** listwise deletion** when a value of the VAR or WITH statement variable is missing. PROC CORR excludes all observations with missing values from the analysis. Therefore, the number of observations for each pair of variables is identical.

The PARTIAL statement always excludes the observations with missing values by automatically invoking the NOMISS option. With the NOMISS option, the data are processed more efficiently because fewer resources are needed. Also, the resulting correlation matrix is nonnegative definite.

In contrast, if the data set contains missing values for the analysis variables and the NOMISS option is not specified, the resulting correlation matrix might not be nonnegative definite. This leads to several statistical difficulties if you use the correlations as input to regression or other statistical procedures.

**Pearson Correlation**

**Spearman Correlation**

```
=lung spearman;
proc corr data
var age mealcal; run;
```

## Kendall’s rank correlation

```
=lung kendall;
proc corr data
var age mealcal; run;
```