CAMIS - A PHUSE DVOST Working Group

Introduction to CAMIS

Comparing Analysis Method Implementations in Software (CAMIS) is a cross-industry PHUSE DVOST Working Group, run in collaboration with members from PHUSE, PSI, ASA and IASCT. In addition to issue comments, which are hosted in the GitHub Repository, we meet monthly on the 2nd Monday of each month. If you would like to join us please contact us at workinggroups@phuse.global.

Motivation

The goal of this project is to demystify conflicting results in statistical analysis methods and results between primarily SAS, R, and Python programming languages by providing comparisons and comprehensive explanations of similarities and differences. Many discrepancies have been discovered in statistical analysis results between these and other programming languages. The differences in results are due to fundamental approaches implemented by each language, which are each correct in their own right. The fact that these differences exist is a challenge, especially related to sponsor companies when submitting to a regulatory agency.

In its Statistical Software Clarifying Statement, the US Food and Drug Administration (FDA) states that it “FDA does not require use of any specific software for statistical analyses” and that “the computer software used for data management and statistical analysis should be reliable.” Observing differences across languages can reduce the analyst’s confidence in reliability and, by understanding the source of any discrepancies, one can reinstate confidence in reliability. CAMIS seeks to explore and explain some of the differences and similarities in statical analysis methods between these languages to ease these concerns.

Repository

The repository below provides examples of statistical methodology in different software and languages, along with a comparison of the results obtained and description of any discrepancies.

Methods		R	SAS	Python	Comparison
Summary Statistics	Rounding	R	SAS	Python	R vs SAS
	Summary statistics	R	SAS	Python	R vs SAS
	Skewness/Kurtosis	R	SAS	Python	R vs SAS
General Linear Models	One Sample t-test	R	SAS	Python	R vs SAS
	Paired t-test	R	SAS	Python	R vs SAS
	Two Sample t-test	R	SAS	Python	R vs SAS
	ANOVA	R	SAS	Python	R vs SAS
	ANCOVA	R	SAS	Python	R vs SAS
	MANOVA	R	SAS	Python	R vs SAS
	Linear Regression	R	SAS	Python	R vs SAS
Generalized Linear Models	Logistic Regression	R	SAS	Python	R vs SAS
Generalized Linear Models	Poisson/Negative Binomial Regression	R	SAS		R vs SAS
Non-parametric Analysis	Wilcoxon signed rank	R	SAS/ StatXact		R vs SAS
	Mann-Whitney U/Wilcoxon rank sum	R	SAS		R vs SAS
	Kolmogorov-Smirnov test
	Kruskall-Wallis test	R	SAS	Python	R vs SAS
	Friedman test	R	SAS		R vs SAS
	Jonckheere test	R	SAS		R vs SAS
	Hodges-Lehman Estimator	R	SAS
Categorical Data Analysis	Binomial test	R	SAS	Python
	McNemar's test	R	SAS		R vs SAS
	Marginal Homogeneity Tests	R
	Chi-Square Association/Fishers exact	R	SAS	Python	R vs SAS
	Cochran Mantel Haenszel	R	SAS		R vs SAS
	Confidence Intervals for proportions	R	SAS		R vs SAS
Repeated Measures	Linear Mixed Model (MMRM)	R	SAS		R vs SAS
	Linear Mixed Model (degrees of freedom)
	Generalized Linear Mixed Model (GLMM)
	Generalized Estimating Equation (GEE)
	Bayesian MMRM
Multiple Imputation - Continuous Data MAR	MCMC
	Linear regression	R	SAS
	Predictive Mean Matching	R
Multiple Imputation - Continuous Data MNAR	Tipping Point (Delta Adjustment)	R	SAS		R vs SAS
Multiple Imputation - Continuous Data MNAR	Reference-Based Imputation/Joint Modelling	R	SAS		R vs SAS
Correlation	Pearson's/ Spearman's/ Kendall's Rank	R	SAS	Python	R vs SAS
Survival Models	Kaplan-Meier Log-rank test and Cox-PH	R	SAS		R vs SAS
	Cause Specific Hazards	R	SAS		R vs SAS
	Accelerated Failure Time	R
	Weighted Log-rank test	R
	Cumulative Incidence Functions	R	SAS		R vs SAS
	Tobit regression	R	SAS		R vs SAS
	Restricted Mean Survival Time (RMST)		SAS
Sample size/ Power calculations	Intro to Sample Size				Summary
	Superiority Single timepoint	R	SAS
	Equivalence Single timepoint	R	SAS
	Non-Inferiority Single timepoint	R	SAS
	Average BioEquivalence	R
	Cochran-Armitage Test For Trend	R	SAS/ StatXact
	Group sequential designs	R	East		East vs R
Causal inference/ Machine learning	Intro to Machine Learning				Summary
	Propensity Score Matching				R vs SAS
	Propensity Score Weighting
	Clustering	R
	Factor analysis
	Principal Components Analysis (PCA)	R
	Canonical correlation
	Partial Least Squares (PLS)
	Lasso
	Ridge Regression
	xgboost	R
NA	NA
Other Methods	Survey statistics	R	SAS	Python	R vs SAS vs Python