proc format;
value respmulti
1='Liver'
2='Lung'
3='Bone';
run;
data resp;
set resp1;
call streaminit(1234);
respord = rand("integer", 1, 3); *Ordinal;
respnom = put(respord, respmulti.); *Nominal;
run;Generalized Estimating Equations (GEE) methods in SAS
INTRODUCTION
Generalized Estimating Equations (GEE) methods extend the Generalized Linear Model (GLM) framework using link functions that relate the predictors to transformed outcome variable. For dichotomous response variables, the link functions is the probit (in case of rare events complementary log-log may be preferable). For outcomes with more than two categories, the cumulative link function is used in case of ordinal variables and generalized logit for nominal variables.
GEE are marginal models and therefore estimate population-averaged effects and within-subject correlation is analysed by specifying a working correlation structure (as in MMRM). Estimators are obtained via quasi-likelihood via iterative solving of estimating equations.
EXAMPLE DATA
A SAS data of clinical trial data comparing two treatments for a respiratory disorder available in “Gee Model for Binary Data” in the SAS/STAT Sample Program Library [1] is used to create these examples.
To uniquely identify subjects, a new variable USUBJID was created by concatenating SITE and ID. Variables TREATMENT and VISIT were renamed to TRTP and AVISITN.
Additionally, two variables were created using randomly generated values to simulate variables with more than two categories. One was an ordinal variable with values 1, 2, and 3; the other was a nominal variable with categories ‘liver’, ‘lung’, and ‘bone’. The resulting dataset is saved here: /data/resp.xlsx The following SAS code was used:
BINARY OUTCOME
CODE
This example shows syntax with PROC GEE using the example data. The probability of the event ( specified as event='1') is analyzed including treatment, visit and the treatment by visit interaction as fixed effects. The independent correlation matrix (which is the option by default), is used to account for intra-subjects correlation.
The binomial distribution and the link function (in this case is the logit function) are specified in the model statement as /dist=bin link=logit). The unique subject and the correlation structure are defined as repeated subject=<SUBJECT>/corr=<CORRELATION MATRIX TYPE> .
The U.S. Food and Drug Administration (FDA) advises “sponsors to consider using of robust standard error method such as the Huber-White ”sandwich” standard error, particularly when the model does not include treatment by covariate interactions” [2]. This robust “sandwich” SE is computed by default in PROC GEE. Nominal SE (also called model-based SE) can be easily obtained by adding the modelse option in the repeated statement. This option is commented out in the code below, but included to indicate its availability.
Predicted probabilities and Odds Ratios (OR) can be obtained in SAS using the lsmeans statement:
The
ilinkoption provides back-transformed predicted probabilities.The
diffoption, combined with eitherexporoddsratio, computes ORs.clcomputes confidence intervals.
proc gee data=resp;
class trtp(ref="P") avisitn(ref='1') usubjid;
model outcome(event='1') = trtp avisitn trtp*avisitn/ dist=bin link=logit;
lsmeans trtp*avisitn/cl exp ilink oddsratio diff;
repeated subject=usubjid/corr=ind /*modelse*/;
run;
Similar syntax can be used in PROC GENMOD. While the syntax is equivalent, very slight differences in the results may occur, typically beyond the tenth decimal place.
RESULTS
Results were extracted into a SAS dataset using the ODS OUTPUT statement and subsequently refined by selecting key rows to facilitate clearer presentation. For instance, it only displays the probabilities for the treatment by visit interaction, and the OR comparing the active treatment versus placebo.
Estimated Parameters:

Probability of event:

ODDS RATIO (OR):

OUTCOME WITH MORE THAN 2 CATEGORIES
Similar syntax as for binary variables can be applied by specifying a multinomial distribution and selecting the appropriate link function. Models with cumulative link functions apply to ordinal data and generalized logit models are fit to nominal data [3]). Note the link function generalized logit is available in PROC GEE, but not in PROC GENMOD.
For multinomial responses, SAS limits the correlation matrix type to independent, so other correlation maxtrix options are not supported.
The estimated parameters for each model are detailed below. ORs can be obtained using the LSMEANS statement, following the same approach used for binary outcomes.
CODE
Ordinal variable:
proc gee data=resp;
class trtp(ref="A") avisitn(ref='1') usubjid;
model respord=trtp avisitn trtp*avisitn/ dist=multinomial link=cumlogit;
lsmeans trtp*avisitn/cl exp ilink oddsratio diff;
repeated subject=usubjid/corr=ind;
run;
Nominal variable:
proc gee data=resp ;
class trtp(ref="A") avisitn(ref='1') usubjid;
model respnom(event='Liver')=trtp avisitn trtp*avisitn/ dist=multinomial link=glogit;
lsmeans trtp*avisitn/cl exp ilink oddsratio diff;
repeated subject=usubjid/corr=ind;
run;
RESULTS
Ordinal variable:

Nominal variable:
