CAMIS - A PHUSE DVOST Working Group

Introduction to CAMIS

Comparing Analysis Method Implementations in Software (CAMIS) is a cross-industry PHUSE DVOST Working Group, run in collaboration with members from PHUSE, PSI, ASA and IASCT. In addition to issue comments, which are hosted in the GitHub Repository, we meet monthly on the 2nd Monday of each month. If you would like to join us please contact us at workinggroups@phuse.global.

Motivation

The goal of this project is to demystify conflicting results in statistical analysis methods and results between primarily SAS, R, and Python programming languages by providing comparisons and comprehensive explanations of similarities and differences. Many discrepancies have been discovered in statistical analysis results between these and other programming languages. The differences in results are due to fundamental approaches implemented by each language, which are each correct in their own right. The fact that these differences exist is a challenge, especially related to sponsor companies when submitting to a regulatory agency.

In its Statistical Software Clarifying Statement, the US Food and Drug Administration (FDA) states that it “FDA does not require use of any specific software for statistical analyses” and that “the computer software used for data management and statistical analysis should be reliable.” Observing differences across languages can reduce the analyst’s confidence in reliability and, by understanding the source of any discrepancies, one can reinstate confidence in reliability. CAMIS seeks to explore and explain some of the differences and similarities in statical analysis methods between these languages to ease these concerns.

Repository

The repository below provides examples of statistical methodology in different software and languages, along with a comparison of the results obtained and description of any discrepancies.

Statistical Methodology	R	SAS	Python	Comparison
Summary Statistics
Rounding	R	SAS	Python	R vs SAS
Summary statistics	R	SAS	Python	R vs SAS
Skewness/Kurtosis	R	SAS	Python	R vs SAS
General Linear Models
One-sample t-test	R	SAS	Python	R vs SAS
Paired t-test	R	SAS	Python	R vs SAS
Two-sample t-test	R	SAS	Python	R vs SAS
ANOVA	R	SAS	Python	R vs SAS
ANCOVA	R	SAS	Python	R vs SAS
MANOVA	R	SAS	Python	R vs SAS
Linear regression	R	SAS	Python	R vs SAS
Generalized Linear Models
Logistic regression	R	SAS	Python	R vs SAS
Poisson/negative binomial regression	R	SAS		R vs SAS
Non-Parametric Analysis
Wilcoxon signed-rank test	R	SAS/ StatXact		R vs SAS
Mann-Whitney U/Wilcoxon rank-sum test	R	SAS		R vs SAS
Kolmogorov-Smirnov test	R
Kruskal-Wallis test	R	SAS	Python	R vs SAS
Friedman test	R	SAS		R vs SAS
Jonckheere-Terpstra test	R	SAS		R vs SAS
Hodges-Lehman estimator	R	SAS
Categorical Data Analysis
Single proportion - Binomial test	R	SAS	Python
Two paired proportions - McNemar's test	R	SAS		R vs SAS
Marginal homogeneity tests	R
Two independent proportions - Chi-square test/Fisher's exact test	R	SAS	Python	R vs SAS
Stratified tables - Cochran-Mantel-Haenszel test	R	SAS		R vs SAS
Intro to CIs for proportions				Summary
CIs - Single Proportion	R	SAS		R vs SAS
CIs - Two Independent Proportions		SAS
CIs - Two Paired Proportions
CIs - Proportions in Stratified Designs
CIs - Poisson Exposure Adjusted Incidence Rates
Correlated Data Analysis
Linear Mixed Model Random Effect	R	SAS		R vs SAS
Linear Mixed Model Repeated Measure (MMRM)	R	SAS		R vs SAS
Generalized Linear Mixed Model (GLMM)	R	SAS		R vs SAS
Generalized Estimating Equation (GEE)	R	SAS		R vs SAS
Multiple Imputation - Continuous Data MAR
Linear regression imputation	R	SAS
Predictive mean matching	R
Multiple Imputation - Continuous Data MNAR
Tipping point analysis (delta adjustment)	R	SAS		R vs SAS
Reference-based imputation/joint modeling	R	SAS		R vs SAS
Correlation
Pearson/Spearman/Kendall's Rank	R	SAS	Python	R vs SAS
Survival Models
Kaplan-Meier/log-rank test/Cox proportional hazards	R	SAS		R vs SAS
Cause-specific hazards	R	SAS		R vs SAS
Accelerated failure time	R
Weighted log-rank test	R
Recurrent events	R	SAS		R vs SAS
Cumulative incidence functions	R	SAS		R vs SAS
Tobit regression	R	SAS		R vs SAS
Restricted Mean Survival Time (RMST)		SAS
Bayesian Methods
Intro to Bayesian Analysis
Repeated Measures MMRM
Sample size/ Power calculations
Intro to Sample Size				Summary
Superiority Single timepoint	R	SAS		R vs SAS
Equivalence Single timepoint	R	SAS
Non-Inferiority Single timepoint	R	SAS
Average BioEquivalence	R
Cochran-Armitage Test For Trend	R	SAS/ StatXact
Group sequential designs-Survival	R	SAS	East	R vs SAS vs East
Group sequential designs-Binary Endpoint
Causal inference/ Machine learning
Intro to Machine Learning				Summary
Propensity Score Matching	R			R vs SAS
Propensity Score Weighting				R vs SAS
Clustering	R
Principal Components Analysis (PCA)	R
Lasso
Ridge Regression
xgboost	R
Other Methods
Survey statistics	R	SAS	Python	R vs SAS vs Python