10, 20, 30, 40, 150, 160, 170, 180, 190, 200
Deriving Quantiles or Percentiles in R vs SAS
Data
The following data will be used show the differences between the default percentile definitions used by SAS and R:
SAS Code
Assuming the data above is stored in the variable aval
within the dataset adlb
, the 25th and 40th percentiles could be calculated using the following code.
=adlb;
proc univariate data
var aval;=stats pctlpts=25 40 pctlpre=p;
output out run;
This procedure creates the dataset stats
containing the variables p25
and p40
.
The procedure has the option PCTLDEF
which allows for five different percentile definitions to be used. The default is PCTLDEF=5
.
R code
The 25th and 40th percentiles of aval
can be calculated using the quantile
function.
quantile(adlb$aval, probs = c(0.25, 0.4))
This gives the following output.
25% 40%
32.5 106.0
The function has the argument type
which allows for nine different percentile definitions to be used. The default is type = 7
.
Comparison
The default percentile definition used by the UNIVARIATE procedure in SAS finds the 25th and 40th percentiles to be 30 and 95. The default definition used by R finds these percentiles to be 32.5 and 106.
It is possible to get the quantile function in R to use the same definition as the default used in SAS, by specifying type=2
.
quantile(adlb$aval, probs = c(0.25, 0.4), type=2)
This gives the following output.
25% 40%
30 95
It is not possible to get the UNIVARIATE procedure in SAS to use the same definition as the default used in R.
Rick Wicklin provided a blog post showing how SAS has built in support for calculations using 5 of the 9 percentile definitions available in R, and also demonstrated how you can use a SAS/IML function to calculate percentiles using the other 4 definitions.
More information about quantile derivation can be found in the SAS blog.
Key references:
Compare the default definitions for sample quantiles in SAS, R, and Python