=htwt;
proc means data
run;
for HTWT Data Set
Descriptive Statistics
The MEANS Procedure
Variable Label N Mean Std Dev Minimum Maximum-----------------------------------------------------------------------------
237 16.4430380 1.8425767 13.9000000 25.0000000
AGE AGE 237 61.3645570 3.9454019 50.5000000 72.0000000
HEIGHT HEIGHT 237 101.3080169 19.4406980 50.5000000 171.5000000
WEIGHT WEIGHT ----------------------------------------------------------------------------
Linear Regression
To demonstrate the use of linear regression we examine a dataset that illustrates the relationship between Height and Weight in a group of 237 teen-aged boys and girls. The dataset is available at (../data/htwt.csv) and is imported to sas using proc import procedure.
Descriptive Statistics
The first step is to obtain the simple descriptive statistics for the numeric variables of htwt data, and one-way frequencies for categorical variables. This is accomplished by employing proc means and proc freq procedures There are 237 participants who are from 13.9 to 25 years old. It is a cross-sectional study, with each participant having one observation. We can use this data set to examine the relationship of participants’ height to their age and sex.
=htwt;
proc freq data
tables sex;
run;
for Sex for HTWT Data Set
Oneway Frequency Tabulation
The FREQ Procedure
Cumulative Cumulative
SEX Frequency Percent Frequency Percent-------------------------------------------------------------
111 46.84 111 46.84
f 126 53.16 237 100.00 m
In order to create a regression model to demonstrate the relationship between age and height for females, we first need to create a flag variable identifying females and an interaction variable between age and female gender flag.
data htwt2;
set htwt;if sex="f" then female=1;
if sex="m" then female=0;
*model to demonstrate interaction between female gender and age;
= female * age;
fem_age run;
Regression Analysis
Next, we fit a regression model, representing the relationships between gender, age, height and the interaction variable created in the datastep above. We again use a where statement to restrict the analysis to those who are less than or equal to 19 years old. We use the clb option to get a 95% confidence interval for each of the parameters in the model. The model that we are fitting is height = b0 + b1 x female + b2 x age + b3 x fem_age + e
=htwt2;
proc reg data<=19;
where age = female age fem_age / clb;
model height
run; quit;
219
Number of Observations Read 219
Number of Observations Used
Analysis of Variance
Sum of Mean> F
Source DF Squares Square F Value Pr 3 1432.63813 477.54604 60.93 <.0001
Model 215 1684.95730 7.83701
Error 218 3117.59543
Corrected Total
2.79947 R-Square 0.4595
Root MSE 61.00457 Adj R-Sq 0.4520
Dependent Mean 4.58895 Coeff Var
We examine the parameter estimates in the output below.
Parameter Estimates
Parameter Standard> |t| 95% Confidence Limits
Variable DF Estimate Error t Value Pr 1 28.88281 2.87343 10.05 <.0001 23.21911 34.54650
Intercept 1 13.61231 4.01916 3.39 0.0008 5.69031 21.53432
female 1 2.03130 0.17764 11.44 <.0001 1.68117 2.38144
AGE 1 -0.92943 0.24782 -3.75 0.0002 -1.41791 -0.44096 fem_age
From the parameter estimates table the coefficients b0,b1,b2,b3 are estimated as b0=28.88 b1=13.61 b2=2.03 b3=-0.92942
The resulting regression model for height, age and gender based on the available data is height=28.88281 + 13.61231 x female + 2.03130 x age -0.92943 x fem_age