Methods | R | SAS | Python | Comparison | |
---|---|---|---|---|---|
Summary Statistics | Rounding | R | SAS | Python | R vs SAS |
Summary statistics | R | SAS | Python | R vs SAS | |
Skewness/Kurtosis | R | SAS | Python | R vs SAS | |
General Linear Models | One Sample t-test | R | SAS | Python | R vs SAS |
Paired t-test | R | SAS | Python | R vs SAS | |
Two Sample t-test | R | SAS | Python | R vs SAS | |
ANOVA | R | SAS | Python | R vs SAS | |
ANCOVA | R | SAS | Python | R vs SAS | |
MANOVA | R | SAS | Python | R vs SAS | |
Linear Regression | R | SAS | Python | R vs SAS | |
Generalized Linear Models | Logistic Regression | R | SAS | Python | R vs SAS |
Poisson/Negative Binomial Regression | R | R vs SAS | |||
Categorical Repeated Measures | |||||
Categorical Multiple Imputation | |||||
Non-parametric Analysis | Wilcoxon signed rank | R | SAS | R vs SAS | |
Mann-Whitney U/Wilcoxon rank sum | R | SAS | |||
Kolmogorov-Smirnov test | |||||
Kruskall-Wallis test | R | SAS | Python | R vs SAS | |
Friedman test | |||||
Jonckheere test | R | SAS | R vs SAS | ||
Hodges-Lehman Estimator | R | SAS | |||
Categorical Data Analysis | Binomial test | R | |||
McNemar's test | R | SAS | R vs SAS | ||
Chi-Square Association/Fishers exact | R | SAS | Python | R vs SAS | |
Cochran Mantel Haenszel | R | SAS | R vs SAS | ||
Confidence Intervals for proportions | R | SAS | R vs SAS | ||
Repeated Measures | Linear Mixed Model (MMRM) | R | SAS | R vs SAS | |
Linear Mixed Model (degrees of freedom) | |||||
Generalized Linear Mixed Model (MMRM) | |||||
Bayesian MMRM | |||||
Multiple Imputation - Continuous Data MAR | MCMC | ||||
Linear regression | R | ||||
Predictive Mean Matching | R | ||||
Multiple Imputation - Continuous Data MNAR | Delta Adjustment/Tipping Point | ||||
Reference-Based Imputation/Sequential Methods | |||||
Reference-Based Imputation/Joint Modelling | |||||
Correlation | Pearson's/ Spearman's/ Kendall's Rank | R | SAS | Python | R vs SAS |
Survival Models | Kaplan-Meier Log-rank test and Cox-PH | R | SAS | R vs SAS | |
Accelerated Failure Time | R | ||||
Non-proportional hazards methods | R | ||||
Cumulative Incidence Functions | R | SAS | R vs SAS | ||
Sample size /Power calculations | Single timepoint analysis | ||||
Group sequential designs | R | East | |||
Multivariate methods | Clustering | R | |||
Factor analysis | |||||
PCA | R | ||||
Canonical correlation | |||||
PLS | |||||
Causal inference | Propensity scores | ||||
Matching | |||||
Weighting | |||||
G-computation | |||||
Other Methods | Survey statistics | R | SAS | Python | R vs SAS vs Python |
Nearest neighbour | |||||
Machine learning | R |
CAMIS - A PHUSE DVOST Working Group
Introduction
Several discrepancies have been discovered in statistical analysis results between different programming languages, even in fully qualified statistical computing environments. Subtle differences exist between the fundamental approaches implemented by each language, yielding differences in results which are each correct in their own right. The fact that these differences exist causes unease on the behalf of sponsor companies when submitting to a regulatory agency, as it is uncertain if the agency will view these differences as problematic. In its Statistical Software Clarifying Statement, the US Food and Drug Administration (FDA) states that it “FDA does not require use of any specific software for statistical analyses” and that “the computer software used for data management and statistical analysis should be reliable.” Observing differences across languages can reduce the analyst’s confidence in reliability and, by understanding the source of any discrepancies, one can reinstate confidence in reliability.
Motivation
The goal of this project is to demystify conflicting results between software and to help ease the transitions to new languages by providing comparison and comprehensive explanations.
Repository
The repository below provides examples of statistical methodology in different software and languages, along with a comparison of the results obtained and description of any discrepancies.