=dat (obs=10);
proc print data
var PATIENT GENDER THERAPY RELDAYS VISIT BASVAL HAMDTL17 CHANGE; run;
<Reference-Based Multiple Imputation (joint modelling): Continuous Data>
Reference-based multiple imputation (rbmi)
Methodology introduction
Reference-based multiple imputation methods have become popular for handling missing data, as well as for conducting sensitivity analyses, in randomized clinical trials. In the context of a repeatedly measured continuous endpoint assuming a multivariate normal model, Carpenter et al. (2013) proposed a framework to extend the usual MAR-based MI approach by postulating assumptions about the joint distribution of pre- and post-deviation data. Under this framework, one makes qualitative assumptions about how individuals’ missing outcomes relate to those observed in relevant groups in the trial, based on plausible clinical scenarios. Statistical analysis then proceeds using the method of multiple imputation (Rubin 1976, Rubin 1987).
In general, multiple imputation of a repeatedly measured continuous outcome can be done via 2 computational routes (Roger 2022):
Stepwise: split problem into separate imputations of data at each visit
requires monotone missingness, such as missingness due to withdrawal
conditions on the imputed values at previous visit
Bayesian linear regression problem is much simpler with monotone missing, as one can sample directly using conjugate priors
One-step approach (joint modelling): Fit a Bayesian full multivariate normal repeated measures model using MCMC and then draw a sample.
Here, we illustrate reference-based multiple imputation of a continuous outcome measured repeatedly via the so-called one-step approach.
Five Macros
The five macros
(Roger 2022), available at LSHTM DIA Missing Data, fit a Bayesian Normal RM model and then impute post withdrawal data under a series of possible post-withdrawal profiles including J2R, CIR and CR as described by Carpenter et al. (2013). It then analyses the data using a univariate ANOVA at each visit and summarizes across imputations using Rubin’s rules.
The following standard and reference-based multiple imputation approaches will be illustrated here:
* MAR (Missing At Random)
* CIR (Copy Increment from Reference)
* J2R (Jump to Reference)
* CR (Copy Reference)
Data used
A publicly available example dataset from an antidepressant clinical trial of an active drug versus placebo is used. Overall, data of 172 patients is available with 88 patients receiving placebo and 84 receiving active drug. The same data is used for the R part.
The relevant endpoint is the Hamilton 17-item depression rating scale (HAMD17) which was assessed at baseline and at weeks 1, 2, 4, and 6 (visits 4-7). Study drug discontinuation occurred in 24% (20/84) of subjects from the active drug and 26% (23/88) of subjects from placebo. All data after study drug discontinuation are missing.
The number of patients per visit and arm are:
=dat;
proc freq data*THERAPY / norow nocol nopercent nocum;
table VISIT run;
The mean change from baseline of the endpoint (Hamilton 17-item depression rating scale, HAMD17) per visit per treatment group using only the complete cases are:
=dat n mean nonobs;
proc means data
class VISIT THERAPY;
var CHANGE; run;
The missingness pattern is show below. The incomplete data is primarily monotone in nature. 128 patients have complete data for all visits (all 1’s at each visit). 20, 10 and 13 patients have 1, 2 or 3 monotone missing data, respectively. Further, there is a single additional intermittent missing observation (patient 3618).
=dat out=HAMD_wide(drop=_NAME_) prefix=CHG;
proc transpose data
by PATIENT THERAPY BASVAL;
id VISIT;
var CHANGE;
run;
=HAMD_wide nimpute=0 displaypattern=NOMEANS;
proc mi data
var CHG4 CHG5 CHG6 CHG7; run;
Complete case analysis
A complete case analysis is performed using mixed model for repeated measures (MMRM) with covariates: treatment [THERAPY], gender [GENDER], visit [VISIT] as factors; baseline score [BASVAL] as continuous; and visit-by-treatment [THERAPY * VISIT] interaction, and visit-by-baseline [BASVAL * VISIT] interaction. An unstructured covariance matrix is used. The MIXED procedure is used.
=dat method=reml;
proc mixed dataTHERAPY(ref="PLACEBO") VISIT(ref="4") PATIENT GENDER(ref="F");
class = THERAPY GENDER VISIT BASVAL THERAPY*VISIT BASVAL*VISIT /s ddfm=satterthwaite;
model CHANGE / type=UN subject=PATIENT r;
repeated VISIT *VISIT / diff=control("PLACEBO" "7") cl;
lsmeans THERAPY run;
The parameter estimates of the fixed effects are:
The estimated unstructured covariance matrix parameters are:
The treatment difference at visit 7 is of interest, and is estimated to be -2.829 (se=1.117) with 95% CI of [-5.033 to -0.624] (p=0.0122).
Five macros: MAR approach
As described above, the so-called five macros
will be used for the SAS implementation. The five macros
are available at LSHTM DIA Missing Data under Reference-based MI via Multivariate Normal RM (the “five macros and MIWithD”). For the details, see the use guide available in the download of the five macros
.
Applying the five macros
for reference-based multiple imputation entails the sequential run of the following:
Part1A declares the parameter estimation model and checks consistency with the dataset. It builds a master dataset which holds details of the current job (run of the macros in sequence). It also builds indexes for the classification variables, which may be either numeric or character.
Part1B fits the parameter estimation model using the MCMC procedure and draws a pseudo-independent sample from the joint posterior distribution for the linear predictor parameters and the covariance parameters.
Part2A calculates the predicted mean under MAR, and under MNAR for each subject based on their withdrawal pattern once for each draw of the linear predictor parameter estimates. The choice of MNAR is controlled by the method used, which may vary from subject to subject.
Part2B imputes the intermediate missing values using MAR and the trailing missing values using MNAR, by deriving the conditional distribution for the missing values conditional on the observed values and covariates, using the appropriate sampled covariance parameter estimates.
Part3 carries out a univariate ANOVA analysis at selected time points usually based on the same covariates as the parameter estimation model. It then combines the least-squares means and their differences using the MIANALYZE procedure to provide final results. It is in this macro which handles the Delta methods.
Most of the computation time is spent in the Part1B macro where the MCMC procedure is used to generate a sample from the posterior distribution of the full set of model parameters. These are stored away and can be used repeatedly by calling the later macros over and over again using different imputation methods.
To perform reference-based multiple imputation using MAR approach to following code is used
part1A(jobname = HAMD,
%Data=dat,
Subject=PATIENT,
RESPONSE = CHANGE,
Time = VISIT,
Treat = THERAPY,
Covbytime = BASVAL,
Catcov = GENDER);
part1B(jobname = HAMD,
%Ndraws = 500,
thin = 10,
seed = 12345);
part2A(jobname = HAMD_MAR,
%inname = HAMD,
method = MAR);
part2B(jobname = HAMD_MAR,
%seed = 12345);
part3(Jobname = HAMD_MAR,
%anref = PLACEBO,
Label = MAR);
To print the results of the contrast at week 7
=HAMD_MAR_OUT;
proc print data= "7";
where VISIT
var VISIT THERAPY _THERAPY Diff SE_Diff df Probt LCL_Diff UCL_Diff; run;
Five macros: MNAR CR approach
To perform reference-based multiple imputation using Copy Reference (CR) approach the following changes are needed in part2A of the 5 macros
part2A(jobname = HAMD_CR,
%inname = HAMD,
method = CR,
ref = PLACEBO);
The results for M=500 imputations are
Five macros: MNAR J2R approach
To perform reference-based multiple imputation using Jump to Reference (J2R) approach the following changes are needed in part2A of the 5 macros
part2A(jobname = HAMD_J2R,
%inname = HAMD,
method = J2R,
ref = PLACEBO);
The results for M=500 imputations are
Five macros: MNAR CIR approach
To perform reference-based multiple imputation using Copy Increments in Reference (CIR) approach the following changes are needed in part2A of the 5 macros
part2A(jobname = HAMD_CIR,
%inname = HAMD,
method = CIR,
ref = PLACEBO);
The results for M=500 imputations are
Summary of results
In the table we present the results of the different imputation strategies (and with varying number, M, of multiple imputation draws). Note that some results can be slightly different from the results above due to a possible different seed. The table show the contrast at Visit 7 between DRUG and PLACEBO [DRUG - PLACEBO]:
Method | Estimate | SE | 95% CI | p-value |
---|---|---|---|---|
Complete Case | -2.829 | 1.117 | -5.035 to -0.623 | 0.0123 |
MI - MAR (M=500) | -2.810 | 1.122 | -5.029 to -0.592 | 0.0134 |
MI - MAR (M=2000) | -2.816 | 1.128 | -5.047 to -0.586 | 0.0137 |
MI - MAR (M=5000) | -2.825 | 1.123 | -5.045 to -0.605 | 0.0130 |
MI - MNAR CR (M=500) | -2.384 | 1.120 | -4.598 to -0.170 | 0.0350 |
MI - MNAR CR (M=2000) | -2.390 | 1.118 | -4.599 to -0.180 | 0.0342 |
MI - MNAR CR (M=5000) | -2.400 | 1.115 | -4.604 to -0.196 | 0.0330 |
MI - MNAR J2R (M=500) | -2.122 | 1.141 | -4.377 to 0.133 | 0.0650 |
MI - MNAR J2R (M=2000) | -2.135 | 1.140 | -4.388 to 0.117 | 0.0630 |
MI - MNAR J2R (M=5000) | -2.144 | 1.136 | -4.389 to 0.101 | 0.0611 |
MI - MNAR CIR (M=500) | -2.461 | 1.120 | -4.674 to -0.248 | 0.0296 |
MI - MNAR CIR (M=2000) | -2.469 | 1.118 | -4.679 to -0.260 | 0.0287 |
MI - MNAR CIR (M=5000) | -2.481 | 1.115 | -4.684 to -0.278 | 0.0276 |
Discussion
A note on computational time. The total running time (including data loading, setting up data sets, MCMC run, imputing data and analysis MI data) for M=500 was about 23 seconds on a personal laptop. It increased to about 44 seconds for M=2000. Computational time was similar across difference imputation strategies.
With a small number of Ndraws
in part1B a warning could pop-up “There is still significant autocorrelation after 5 lags, and the effective sample size for the parameter might not be estimated accurately.”. Increasing the number of Ndraws
will mostly solve this warning. For example, for this data example, this message is received when setting Ndraws
below 100.
Appendix 1: mmrm as analysis model
Part 3 of the 5 macros carries out a univariate ANOVA analysis at selected time points usually based on the same covariates as the parameter estimation model. It then combines the least-squares means and their differences using Rubin’s formula in a calculation similar to that in the MIANALYZE procedure to provide final results.
Since, all imputed datasets are readily available (after part2B), another possibility is to analyse each imputed dataset using the analysis model of your choice, and combining the results using PROC MIANALYZE
. For example, suppose an MMRM should be fit on each imputed dataset:
data HAMD_CR_DATAFULL;
set HAMD_CR_DATAFULL;= draw;
_Imputation_
run;
=HAMD_CR_DATAFULL ;
proc mixed data
by _Imputation_;THERAPY(ref="PLACEBO") VISIT(ref="4") PATIENT GENDER(ref="F");
class = THERAPY GENDER VISIT BASVAL THERAPY*VISIT BASVAL*VISIT /s ddfm=satterthwaite;
model CHANGE / type=UN subject=PATIENT r;
repeated VISIT *VISIT / diff=control("PLACEBO" "7") cl;
lsmeans THERAPY=diff01;
ods output Diffs
run;
=diff01(where=(VISIT="7"));
proc mianalyze parms
class THERAPY VISIT;*VISIT;
modeleffects THERAPY=res01;
ods output ParameterEstimates run;
The results for M=500 imputations are
Reference
Carpenter JR, Roger JH & Kenward MG (2013). Analysis of Longitudinal Trials with Protocol Deviation: A Framework for Relevant, Accessible Assumptions, and Inference via MI. Journal of Biopharmaceutical Statistics 23: 1352-1371.
Five macros: Drug Information Association (DIA) Missing Data Working Group (2012). Reference-based MI via Multivariate Normal RM (the “five macros and MIWithD”). London School of Hygiene and Tropical Medicine DIA Missing Data.
PROC MIANALYZE, SAS Institute Inc. (2017). SAS/STAT® 14.3 User’s Guide. Cary, NC: SAS Institute Inc.
Roger J (2022, Dec 8). Other statistical software for continuous longitudinal endpoints: SAS macros for multiple imputation. Addressing intercurrent events: Treatment policy and hypothetical strategies. Joint EFSPI and BBS virtual event.
Rubin DB (1976). Inference and Missing Data. Biometrika 63: 581–592.
Rubin DB (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons.