Correlation Analysis Using R

The most commonly used correlation analysis methods in clinical trials include:

Other association measures are available for count data/contingency tables comparing observed frequencies with those expected under the assumption of independence

Example: Lung Cancer Data

Data source: Loprinzi CL. Laurie JA. Wieand HS. Krook JE. Novotny PJ. Kugler JW. Bartel J. Law M. Bateman M. Klatt NE. et al. Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology. 12(3):601-7, 1994.

Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.

library(survival) 

glimpse(lung)
Rows: 228
Columns: 10
$ inst      <dbl> 3, 3, 3, 5, 1, 12, 7, 11, 1, 7, 6, 16, 11, 21, 12, 1, 22, 16…
$ time      <dbl> 306, 455, 1010, 210, 883, 1022, 310, 361, 218, 166, 170, 654…
$ status    <dbl> 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ age       <dbl> 74, 68, 56, 57, 60, 74, 68, 71, 53, 61, 57, 68, 68, 60, 57, …
$ sex       <dbl> 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, …
$ ph.ecog   <dbl> 1, 0, 0, 1, 0, 1, 2, 2, 1, 2, 1, 2, 1, NA, 1, 1, 1, 2, 2, 1,…
$ ph.karno  <dbl> 90, 90, 90, 90, 100, 50, 70, 60, 70, 70, 80, 70, 90, 60, 80,…
$ pat.karno <dbl> 100, 90, 90, 60, 90, 80, 60, 80, 80, 70, 80, 70, 90, 70, 70,…
$ meal.cal  <dbl> 1175, 1225, NA, 1150, NA, 513, 384, 538, 825, 271, 1025, NA,…
$ wt.loss   <dbl> NA, 15, 15, 11, 0, 0, 10, 1, 16, 34, 27, 23, 5, 32, 60, 15, …

Overview

cor() computes the correlation coefficient between continuous variables x and y, where method chooses which correlation coefficient is to be computed (default: "pearson", "kendall", or "spearman").

cor.test() calulates the test for association between paired samples, using one of Pearson’s product moment correlation coefficient, Kendall’s \(\tau\) or Spearman’s \(\rho\). Besides the correlation coefficient itself, it provides additional information.

Missing values are assumed to be missing completely at random (MCAR). Different strategies are available, see ?cor for details.

Pearson Correlation

cor.test(x = lung$age, y = lung$meal.cal, method = "pearson") 

    Pearson's product-moment correlation

data:  lung$age and lung$meal.cal
t = -3.1824, df = 179, p-value = 0.001722
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3649503 -0.0885415
sample estimates:
       cor 
-0.2314107 

Spearman Correlation

cor.test(x = lung$age, y = lung$meal.cal, method = "spearman")
Warning in cor.test.default(x = lung$age, y = lung$meal.cal, method =
"spearman"): Cannot compute exact p-value with ties

    Spearman's rank correlation rho

data:  lung$age and lung$meal.cal
S = 1193189, p-value = 0.005095
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho 
-0.2073639 

Note: Exact p-values require unanimous ranks.

Kendall’s rank correlation

cor.test(x = lung$age, y = lung$meal.cal, method = "kendall")

    Kendall's rank correlation tau

data:  lung$age and lung$meal.cal
z = -2.7919, p-value = 0.00524
alternative hypothesis: true tau is not equal to 0
sample estimates:
       tau 
-0.1443877 

Interpretation of correlation coefficients

Correlation coefficient is comprised between -1 and 1:

  • \(-1\) indicates a strong negative correlation

  • \(0\) means that there is no association between the two variables

  • \(1\) indicates a strong positive correlation