data dat_used;$ ARM$ Y CENS;
input ID
cards;001 A 8.0 1
002 A 8.0 1
003 A 8.0 1
004 A 8.0 1
005 A 8.9 0
006 A 9.5 0
007 A 9.9 0
008 A 10.3 0
009 A 11.0 0
010 A 11.2 0
011 B 8.0 1
012 B 9.2 0
013 B 9.9 0
014 B 10.0 0
015 B 10.6 0
016 B 10.6 0
017 B 11.3 0
018 B 11.8 0
019 B 12.9 0
020 B 13.0 0
; run;
Tobit regression
Tobit model
Censoring occurs when data on the dependent variable is only partially known. For example, in virology, sample results could be below the lower limit of detection (eg, 100 copies/mL) and in such a case we only know that the sample result is <100 copies/mL, but we don’t know the exact value.
Let \(y^{*}\) be the the true underlying latent variable, and \(y\) the observed variable. We discuss here censoring on the left:
\[ y = \begin{cases} y^{*}, & y^{*} > \tau \\ \tau, & y^{*} \leq \tau \end{cases} \] We consider tobit regression with a censored normal distribution. The model equation is \[ y_{i}^{*} = X_{i}\beta + \epsilon_{i} \] with \(\epsilon_{i} \sim N(0,\sigma^2)\). But we only observe \(y = max(\tau, y^{*})\). The tobit model uses maximum likelihood estimation (for details see for example Breen, 1996). It is important to note that \(\beta\) estimates the effect of \(x\) on the latent variable \(y^{*}\), and not on the observed value \(y\).
Data used
We assume two equally sized groups (n=10 in each group). The data is censored on the left at a value of \(\tau=8.0\). In group A 4/10 records are censored, and 1/10 in group B.
Example Code using SAS
The analysis will be based on a Tobit analysis of variance with \(Y\), rounded to 1 decimal places, as dependent variable and study group as a fixed covariate. A normally distributed error term will be used. Values will be left censored at the value 8.0.
First a data manipulation step needs to be performed in which the censored values are set to missing for a new variable called lower.
data dat_used;
set dat_used;if Y <= 8.0 then lower=.; else lower=Y;
run;
The data are sorted to make sure the intercept will correspond to the mean of ARM A.
=dat_used;
proc sort data
by descending ARM; run;
The LIFEREG procedure is used for tobit regression. The following model syntax is used:
MODEL (lower,upper)= effects / options ;
Here, if the lower value is missing, then the upper value is used as a left-censored value.
=dat_used order=data;
proc lifereg data
class ARM;model (lower, Y) = ARM / d=normal;
/cl alpha=0.05;
lsmeans ARM 'Contrast B-A' ARM 1 -1 / alpha=0.05;
estimate run;
The fit statistics, type 3 analysis of effects and parameter estimated are shown here. The output provides an estimate of difference between groups A and B (B-A), namely 1.8225 (se=0.8061). The presented p-value is a two-sided p-value based on the Z-test. The scale parameter is an estimate for \(\sigma\).
The p-value and confidence intervals of the contrast B-A are shown here. The p-value is the same as above.
Reference
Breen, R. (1996). Regression models. SAGE Publications, Inc., https://doi.org/10.4135/9781412985611
Tobin, James (1958). “Estimation of Relationships for Limited Dependent Variables”. Econometrica. 26 (1): 24-36. doi:10.2307/1907382