Estimating and Testing Cause Specific Hazard Ratio Using R

Objective

In this document we present how to estimate and test cause specific hazard ratio for the probability of experiencing a certain event at a given time in a competing risks model in R. We focus on the basic model where each subject experiences only one out of k possible events as depicted in the figure below.

As this document aims to provide syntax for estimating and testing cause-specific hazard ratios using Cox’s PH model for competing risks, we assume that readers have working knowledge of a competing risks framework. The Reference below list a few literature for a quick refresher on this topic.

The syntax given here produce results match that produced by the default settings of SAS PROC PHREG (see the companion SAS document). This is usually necessary if validating results from the two software is the objective.

R Package

We use the survival package in this document.

Data used

The bone marrow transplant (BTM) dataset as presented by Guo & So (2018) is used. The dataset has the following variables:

  • Group has three levels, indicating three disease groups.

  • T is the disease-free survival time in days. A derived variable TYears = T/365.25 is used in the analysis.

  • Status has value 0 if T is censored; 1 if T is time to relapse; 2 if T is time to death.

  • WaitTime is the waiting time to transplant in days.

  • For illustration, a categorical variable waitCat is created from waitTime as waitCat = TRUE if waitTime > 200, and FALSE otherwise.

require(dplyr)
bmt <- haven::read_sas(file.path("../data/bmt.sas7bdat")) %>%
  mutate(
    Group = factor(Group, 
                   levels = c(1, 2, 3), 
                   labels = c('ALL', 'AML-Low Risk', 'AML-High Risk')
    ),
    Status = factor(Status,
                    levels = c(0, 1, 2),
                    labels = c('Censored', 'Relapse', 'Death')),
    TYears = T / 365.25,
    waitCat = (WaitTime > 200),
    ID = row_number()
  )

Estimating and testing the cause specific hazard ratio

Syntax-wise there are two ways to generate the estimates and related outputs using survival::coxph(). They produce essentially the same results except that the global null hypotheses are different.

Syntax 1: All competing events in one go

require(survival)
csh.1 <- coxph(Surv(TYears, Status) ~ Group + strata(waitCat), 
               data = bmt, 
               id = ID, 
               ties = 'breslow', ## default is 'efron'
               robust = FALSE    ## default is TRUE
               )
summary(csh.1)
Call:
coxph(formula = Surv(TYears, Status) ~ Group + strata(waitCat), 
    data = bmt, ties = "breslow", robust = FALSE, id = ID)

  n= 137, number of events= 83 

                          coef exp(coef) se(coef)      z Pr(>|z|)  
GroupAML-Low Risk_1:2  -0.9453    0.3886   0.4508 -2.097   0.0360 *
GroupAML-High Risk_1:2  0.6162    1.8519   0.3636  1.695   0.0901 .
GroupAML-Low Risk_1:3  -0.3987    0.6712   0.3946 -1.010   0.3123  
GroupAML-High Risk_1:3  0.1273    1.1358   0.4031  0.316   0.7521  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                       exp(coef) exp(-coef) lower .95 upper .95
GroupAML-Low Risk_1:2     0.3886     2.5736    0.1606     0.940
GroupAML-High Risk_1:2    1.8519     0.5400    0.9081     3.777
GroupAML-Low Risk_1:3     0.6712     1.4899    0.3097     1.455
GroupAML-High Risk_1:3    1.1358     0.8804    0.5154     2.503

Concordance= 0.619  (se = 0.031 )
Likelihood ratio test= 17.36  on 4 df,   p=0.002
Wald test            = 16.13  on 4 df,   p=0.003
Score (logrank) test = 17.83  on 4 df,   p=0.001

In the output, rows with suffix 1:2 are for Status = 2, or Relapse; and 1:3 are for Status = 3, or Death. As usual, censoring must be the lowest level in Status, which in this example is coded 0.

Since both events (Relapse and Death) are model together, the global tests have 4 degrees of freedom, with the null hypothesis that there is no difference among different levels of Group in either Relapse or Death.

Syntax 2: One event at a time

csh.2 <- coxph(Surv(TYears, Status == 'Relapse') ~ Group + strata(waitCat), 
               data = bmt, 
               id = ID, 
               ties = 'breslow', ## default is 'efron'
               robust = FALSE    ## default is TRUE
               )
summary(csh.2)
Call:
coxph(formula = Surv(TYears, Status == "Relapse") ~ Group + strata(waitCat), 
    data = bmt, ties = "breslow", robust = FALSE, id = ID)

  n= 137, number of events= 42 

                      coef exp(coef) se(coef)      z Pr(>|z|)  
GroupAML-Low Risk  -0.9453    0.3886   0.4508 -2.097   0.0360 *
GroupAML-High Risk  0.6162    1.8519   0.3636  1.695   0.0901 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                   exp(coef) exp(-coef) lower .95 upper .95
GroupAML-Low Risk     0.3886      2.574    0.1606     0.940
GroupAML-High Risk    1.8519      0.540    0.9081     3.777

Concordance= 0.672  (se = 0.039 )
Likelihood ratio test= 15.37  on 2 df,   p=5e-04
Wald test            = 14.16  on 2 df,   p=8e-04
Score (logrank) test = 15.83  on 2 df,   p=4e-04

The results are identical to those labeled with 1:2 in the earlier outputs under Syntax 1 above. However, since only Relapse is modeled, the global tests have only 2 degrees of freedom with the null hypothesis that there is no difference among different levels of Group for Relapse.

Summary

  • In survival::coxph() the default method for handling ties is ties = 'efron'. To match results with SAS, this needs to be changed to ties =breslow`.

  • For multi-state models such as a competing risk analysis, survival::coxph() by default estimate the standard errors of parameter estimates with a robust sandwich estimator. To match default results with SAS, this needs to be set to robust = FALSE.

  • Due to differences in internal numerical estimation methods of R and SAS, results only match up to the 4th decimal places. However, overall consistency can be established between the two for estimating and testing cause-specific hazard ratio using Cox’s PH model.

 #| echo: false
 si <- sessioninfo::session_info(
 c("cmprsk", "tidycmprsk", "survival" ) #Vector of packages used 
 , dependencies = FALSE)
 si
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       Ubuntu 24.04.1 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       UTC
 date     2025-01-20
 pandoc   3.4 @ /opt/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package    * version date (UTC) lib source
 cmprsk       2.2-12  2024-05-19 [1] RSPM (R 4.4.0)
 survival   * 3.7-0   2024-06-05 [2] CRAN (R 4.4.2)
 tidycmprsk   1.1.0   2024-08-17 [1] RSPM (R 4.4.0)

 [1] /home/runner/work/CAMIS/CAMIS/renv/library/linux-ubuntu-noble/R-4.4/x86_64-pc-linux-gnu
 [2] /opt/R/4.4.2/lib/R/library

──────────────────────────────────────────────────────────────────────────────

Reference

Guo C and So Y. (2018). “Cause-specific analysis of competing risks using the PHREG procedure.” In Proceedings of the SAS Global Forum 2018 Conference. Cary, NC: SAS Institute Inc. https://support.sas.com/resources/papers/proceedings18/2159-2018.pdf.

Pintilie M. (2006). Competing Risks: A Practical Perspective. Wiley. http://dx.doi.org/10.1002/9780470870709

Therneau T, Crowson C, and Atkinson E. (2024). “Multi-state models and competing risks.” https://cran.r-project.org/web/packages/survival/vignettes/compete.pdf