CAMIS - A PHUSE DVOST Working Group

Introduction to CAMIS

Comparing Analysis Method Implementations in Software (CAMIS) is a cross-industry PHUSE DVOST Working Group, run in collaboration with members from PHUSE, PSI, ASA and IASCT. In addition to issue comments, which are hosted in the GitHub Repository, we meet monthly on the 2nd Monday of each month. If you would like to join us please contact us at workinggroups@phuse.global.

Motivation

The goal of this project is to demystify conflicting results in statistical analysis methods and results between primarily SAS, R, and Python programming languages by providing comparisons and comprehensive explanations of similarities and differences. Many discrepancies have been discovered in statistical analysis results between these and other programming languages. The differences in results are due to fundamental approaches implemented by each language, which are each correct in their own right. The fact that these differences exist is a challenge, especially related to sponsor companies when submitting to a regulatory agency.

In its Statistical Software Clarifying Statement, the US Food and Drug Administration (FDA) states that it “FDA does not require use of any specific software for statistical analyses” and that “the computer software used for data management and statistical analysis should be reliable.” Observing differences across languages can reduce the analyst’s confidence in reliability and, by understanding the source of any discrepancies, one can reinstate confidence in reliability. CAMIS seeks to explore and explain some of the differences and similarities in statical analysis methods between these languages to ease these concerns.

Repository

The repository below provides examples of statistical methodology in different software and languages, along with a comparison of the results obtained and description of any discrepancies.

Statistical Methodology R SAS Python Comparison
Summary Statistics
Rounding R SAS Python R vs SAS
Summary statistics R SAS Python R vs SAS
Skewness/Kurtosis R SAS Python R vs SAS
General Linear Models
One-sample t-test R SAS Python R vs SAS
Paired t-test R SAS Python R vs SAS
Two-sample t-test R SAS Python R vs SAS
ANOVA R SAS Python R vs SAS
ANCOVA R SAS Python R vs SAS
MANOVA R SAS Python R vs SAS
Linear regression R SAS Python R vs SAS
Generalized Linear Models
Logistic regression R SAS Python R vs SAS
Poisson/negative binomial regression R SAS R vs SAS
Non-Parametric Analysis
Wilcoxon signed-rank test R SAS/ StatXact R vs SAS
Mann-Whitney U/Wilcoxon rank-sum test R SAS R vs SAS
Kolmogorov-Smirnov test
Kruskal-Wallis test R SAS Python R vs SAS
Friedman test R SAS R vs SAS
Jonckheere-Terpstra test R SAS R vs SAS
Hodges-Lehman estimator R SAS
Categorical Data Analysis
Binomial test R SAS Python
McNemar's test R SAS R vs SAS
Marginal homogeneity tests R
Chi-square test/Fisher's exact test R SAS Python R vs SAS
Cochran-Mantel-Haenszel test R SAS R vs SAS
Confidence intervals for proportions R SAS R vs SAS
Repeated Measures
Linear Mixed Model (MMRM) R SAS R vs SAS
Linear Mixed Model (degrees of freedom)
Generalized Linear Mixed Model (GLMM)
Generalized Estimating Equation (GEE)
Bayesian MMRM
Multiple Imputation - Continuous Data MAR
MCMC imputation
Linear regression imputation R SAS
Predictive mean matching R
Multiple Imputation - Continuous Data MNAR
Tipping point analysis (delta adjustment) R SAS R vs SAS
Reference-based imputation/joint modeling R SAS R vs SAS
Correlation
Pearson/Spearman/Kendall's Rank R SAS Python R vs SAS
Survival Models
Kaplan-Meier/log-rank test/Cox proportional hazards R SAS R vs SAS
Cause-specific hazards R SAS R vs SAS
Accelerated failure time R
Weighted log-rank test R
Recurrent events R SAS R vs SAS
Cumulative incidence functions R SAS R vs SAS
Tobit regression R SAS R vs SAS
Restricted Mean Survival Time (RMST) SAS
Sample size/ Power calculations
Intro to Sample Size Summary
Superiority Single timepoint R SAS
Equivalence Single timepoint R SAS
Non-Inferiority Single timepoint R SAS
Average BioEquivalence R
Cochran-Armitage Test For Trend R SAS/ StatXact
Group sequential designs R East East vs R
Causal inference/ Machine learning
Intro to Machine Learning Summary
Propensity Score Matching R R vs SAS
Propensity Score Weighting R vs SAS
Clustering R
Factor analysis
Principal Components Analysis (PCA) R
Canonical correlation
Partial Least Squares (PLS)
Lasso
Ridge Regression
xgboost R
Other Methods
Survey statistics R SAS Python R vs SAS vs Python