Deriving Quantiles or Percentiles in R vs SAS

Data

The following data will be used show the differences between the default percentile definitions used by SAS and R:

10, 20, 30, 40, 150, 160, 170, 180, 190, 200

SAS Code

Assuming the data above is stored in the variable aval within the dataset adlb, the 25th and 40th percentiles could be calculated using the following code.

proc univariate data=adlb;
  var aval;
  output out=stats pctlpts=25 40 pctlpre=p;
run;

This procedure creates the dataset stats containing the variables p25 and p40.

The procedure has the option PCTLDEF which allows for five different percentile definitions to be used. The default is PCTLDEF=5.

R code

The 25th and 40th percentiles of aval can be calculated using the quantile function.

quantile(adlb$aval, probs = c(0.25, 0.4))

This gives the following output.

  25%   40% 
 32.5 106.0 

The function has the argument type which allows for nine different percentile definitions to be used. The default is type = 7.

Comparison

The default percentile definition used by the UNIVARIATE procedure in SAS finds the 25th and 40th percentiles to be 30 and 95. The default definition used by R finds these percentiles to be 32.5 and 106.

It is possible to get the quantile function in R to use the same definition as the default used in SAS, by specifying type=2.

quantile(adlb$aval, probs = c(0.25, 0.4), type=2)

This gives the following output.

25% 40% 
 30  95 

It is not possible to get the UNIVARIATE procedure in SAS to use the same definition as the default used in R.

Rick Wicklin provided a blog post showing how SAS has built in support for calculations using 5 of the 9 percentile definitions available in R, and also demonstrated how you can use a SAS/IML function to calculate percentiles using the other 4 definitions.

More information about quantile derivation can be found in the SAS blog.

Key references:

Compare the default definitions for sample quantiles in SAS, R, and Python

Sample quantiles: A comparison of 9 definitions

Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361-365.