data dummy;$4;
length USUBJID =1 to 10;
do i=int(ranuni(0)*2);
sex1n=1 to 5;
do j=strip(put(1000+i,best.));
USUBJID=j;
AVISITN=round(1+ranuni(0),0.01);
AVALif ranuni(0) <0.1 then aval=.;
output;
end;
end;
drop i j;
run;=dummy; by usubjid sex1n;run;
proc sort data=dummy out=dummyt(drop=_name_) prefix=v;
proc transpose data
by USUBJID sex1n;
id avisitn;
var aval;
run;=dummyt(obs=5);
proc print data run;
Multiple Imputaton: Linear Regression in SAS
Input dataset preparation before multiple imputation
- Prepare a subset of the analysis dummy dataset, details as below:
USUBJID
(length 4): Subject ID.SEX1N
: Sex A random integer between 0 and 1 representing a binary variable (perhaps gender).AVISITN
: Visit number (1 to 5 for each subject).AVAL
: A random value between 1 and 2, with a random 10% chance of being missing.
As PROC MI requires a horizontal, one record per subject data set. More often than not, the data we impute will come from a vertical ADaM BDS data set. So we need to first transpose the aval with the avisitn as ID (assuming avisitn = 1 to 5),creating transposed variable v1-v5.
Check missing data patterns
The pattern can be checked using the following code, missing data pattern could be classified as “Monotone” or “Arbitrary”
“Monotone” : The missingness of data follows a specific order such that if a certain variable is missing for a particular observation, all subsequent variables are also missing for that observation. If a dataset has columns X1,X2,…,Xk a monotone missing pattern appears when: If Xj is missing, then Xj+1, Xj+2,…,Xj+3 are missing.
“Arbitrary” : The missingness of data does not follow any specific order or predictable sequence. Data can be missing at random points without a discernible pattern.
ods select MissPattern;=dummyt nimpute=0;
proc mi data- v5;
var v1 run;
As below figure shows the missingness dose not follow any specific order, obviously the missing pattern is arbitrary and non-monotone missing pattern.
FCS Regression for non-monotone missing pattern
=dummyt out=outdata nimpute=10 seed=123;
proc mi data
class sex1n;- v5;
var sex1n v1 reg (v1-v5 /details);
fcs run;
- The
VAR
statement above listing the variables to be analyzed, should match the statistical models for efficacy analysis per SAP, which may include TRTPN, necessary grouping variable (for eg AGEGR1/AGEGR1N), and all outcome variables coming from repeated assessments NIMPUTE
: the number of imputationsSEED
: the seed to begin random number generator- Note that depending on the SAS Proc MI algorithm, if there are more factors, the ordering of factors, for example SEX1N, RACE1N, may have an effect on the generation of the imputed values for the missing values, i.e., different orderings of these factors will generate different imputed values (e.g may happen in case of monotone missing pattern) from PROC MI procedure. The ordering of subjects in the dataset may also have an effect on the generation of the imputed values for the missing values.
- The
CLASS
statement specifies the classification variables in the VAR statement. FCS
is displayed as the method, if not specified then MCMC will be the default method.REG
is the specified model which in this example is linear regression)- The
DETAILS
option displays the regression coefficients in the regression model used in each imputation.
Monotone Regression for monotone missing pattern
Let’s update above SAS code to generate a dummy dataset with monotone missing pattern
data dummy;$4;
length USUBJID =1 to 10;
do i=int(ranuni(0)*2);
sex1n= strip(put(1000+i, best.));
USUBJID = ceil(ranuni(0) * 5); /* Randomly decide the start point for missing data (1 to 5) */
miss_start =1 to 5;
do j= j;
AVISITN if j >= miss_start then AVAL = .; /* If the visit number is greater than or equal to miss_start, make AVAL missing */
else AVAL = round(1 + ranuni(0), 0.01);
output;
end;
end;
drop i miss_start j;
run;=dummy; by usubjid sex1n;run;
proc sort data=dummy out=dummyt(drop=_name_) prefix=v;
proc transpose data
by USUBJID sex1n;
id avisitn;
var aval;
run;=dummyt(obs=5);
proc print data
run;
ods select MissPattern;=dummyt nimpute=0;
proc mi data- v5;
var v1 run;
In this case we will use monotone
statement instead of FCS
for the imputation, example code as below:
=dummyt out=outdata nimpute=10 seed=123;
proc mi data
class sex1n;- v5;
var sex1n v1 reg (v1-v5 /details);
monotone run;