proc means data
for HTWT Data Set
Descriptive Statistics
The MEANS Procedure
Variable Label N Mean Std Dev Minimum Maximum-----------------------------------------------------------------------------
237 16.4430380 1.8425767 13.9000000 25.0000000
AGE AGE 237 61.3645570 3.9454019 50.5000000 72.0000000
HEIGHT HEIGHT 237 101.3080169 19.4406980 50.5000000 171.5000000
WEIGHT WEIGHT ----------------------------------------------------------------------------
Linear Regression
To demonstrate the use of linear regression we examine a dataset that illustrates the relationship between Height and Weight in a group of 237 teen-aged boys and girls. The dataset is available at (../data/htwt.csv) and is imported to sas using proc import procedure.
Descriptive Statistics
The first step is to obtain the simple descriptive statistics for the numeric variables of htwt data, and one-way frequencies for categorical variables. This is accomplished by employing proc means and proc freq procedures There are 237 participants who are from 13.9 to 25 years old. It is a cross-sectional study, with each participant having one observation. We can use this data set to examine the relationship of participants’ height to their age and sex.
proc freq data
tables sex;
for Sex for HTWT Data Set
Oneway Frequency Tabulation
The FREQ Procedure
Cumulative Cumulative
SEX Frequency Percent Frequency Percent-------------------------------------------------------------
111 46.84 111 46.84
f 126 53.16 237 100.00 m
In order to create a regression model to demonstrate the relationship between age and height for females, we first need to create a flag variable identifying females and an interaction variable between age and female gender flag.
data htwt2;
set htwt;if sex="f" then female=1;
if sex="m" then female=0;
*model to demonstrate interaction between female gender and age;
= female * age;
fem_age run;
Regression Analysis
Next, we fit a regression model, representing the relationships between gender, age, height and the interaction variable created in the datastep above. We again use a where statement to restrict the analysis to those who are less than or equal to 19 years old. We use the clb option to get a 95% confidence interval for each of the parameters in the model. The model that we are fitting is height = b0 + b1 x female + b2 x age + b3 x fem_age + e
proc reg data<=19;
where age = female age fem_age / clb;
model height
run; quit;
Number of Observations Read 219
Number of Observations Used
Analysis of Variance
Sum of Mean> F
Source DF Squares Square F Value Pr 3 1432.63813 477.54604 60.93 <.0001
Model 215 1684.95730 7.83701
Error 218 3117.59543
Corrected Total
2.79947 R-Square 0.4595
Root MSE 61.00457 Adj R-Sq 0.4520
Dependent Mean 4.58895 Coeff Var
We examine the parameter estimates in the output below.
Parameter Estimates
Parameter Standard> |t| 95% Confidence Limits
Variable DF Estimate Error t Value Pr 1 28.88281 2.87343 10.05 <.0001 23.21911 34.54650
Intercept 1 13.61231 4.01916 3.39 0.0008 5.69031 21.53432
female 1 2.03130 0.17764 11.44 <.0001 1.68117 2.38144
AGE 1 -0.92943 0.24782 -3.75 0.0002 -1.41791 -0.44096 fem_age
From the parameter estimates table the coefficients b0,b1,b2,b3 are estimated as b0=28.88 b1=13.61 b2=2.03 b3=-0.92942
The resulting regression model for height, age and gender based on the available data is height=28.88281 + 13.61231 x female + 2.03130 x age -0.92943 x fem_age