y=function(x){
s1=0
for(v1 in x){s1=s1+v1}
m1=s1/length(x)
i=ceiling(length(x)/2)
if(length(x) %% 2 == 0){i=c(i,i+1)}
s2=0
for(v2 in i){s2=s2+x[v2]}
m2=s2/length(i)
c(m1,m2)
}
y(c(1:7, 100))[1] 16.0 4.5
April 16, 2026
Writing clear code in any software isn’t just a detail—it’s a practical investment that pays off in reproducibility, reliability, and speed of delivery. As a biostatistician, you’ll feel these benefits in day-to-day analysis, regulatory submissions, and long-lived projects.
In this blog, we write-up basic clean code rules. These rules will help you with:
Reproducibility: Clear and deterministic code makes it easy to reproduce results across machines and time. This is essential for peer review, internal validation, and regulatory scrutiny
Maintainability: The code is readable and understandable (e.g., good names and comments) and has a reduced complexity, i.e., it’s easier to fix bugs
Collaboration: Clear code reduces onboarding time and accelerates code reviews
Traceability: For regulated environments, you need transparent logic, documented coding, and reproducible outputs.
Extensibility: The architecture is simpler, cleaner, and more expressive, i.e., it’s easier to extend the capabilities and the risk of introducing bugs is reduced
Performance: The code often runs faster, uses less memory, or is easier to optimize
“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.”
(https://style.tidyverse.org/)
We focus on coding in R for this blog. Most of the materials presented here are taken form a short course “Good Software Engineering Practice for R Packages”. A short course presented at multiple conferences, and recommended for everyone wanting to learn more on writing good code. (see for example https://openstatsware.github.io/shortcourse-iscb2025/listing.html)
If you are interested in a deep dive on good coding styles, you can take a look at the tidyverse style guide (https://style.tidyverse.org/).
Useful R-packages that provide some automation to help clean up your code are:
lintr: performs automated checks to confirm that you conform to a given style
styler: allows you to interactively restyle selected text, files, or entire projects. It includes an RStudio add-in, the easiest way to re-style existing code
goodpractice: a package that provides advice about good practices when building R packages
The following list presents a non-exhaustive summary of clean code rules (CCR):
Naming: Use descriptive and meaningful names for variables, functions, and classes
Formatting: Adhere to consistent indentation, spacing, and bracketing to make the code easy to read
Simplicity: Keep the code as simple and straightforward as possible, avoiding unnecessary complexity
Single Responsibility Principle (SRP): Each function should have a single, well-defined purpose
Don’t Repeat Yourself (DRY): Avoid duplication of code, either by reusing existing code or creating functions
Comments: Use comments to explain the purpose of code blocks and to clarify complex logic
Error Handling: Include error handling code to gracefully handle exceptions and unexpected situations
Test-Driven Development (TDD): Write tests for your code to ensure it behaves as expected and to catch bugs early
Code Review: Have other team members review your code to catch potential issues and improve its quality
This piece of code breaks all common clean code rules. Can you easily spot what this coding is doing? Let’s fix it!
[1] 16.0 4.5
getMeanAndMedian=function(x){
sum1=0
for(value in x){sum1=sum1+value}
meanValue=sum1/length(x)
centerIndices=ceiling(length(x)/2)
if(length(x) %% 2 == 0){
centerIndices=c(centerIndices,centerIndices+1)
}
sum2=0
for(centerIndex in centerIndices){sum2=sum2+x[centerIndex]}
medianValue=sum2/length(centerIndices)
c(meanValue,medianValue)
}Now it is already clear what the function is calculating. In the code you can also see at which step the mean and median is calculated.
getMeanAndMedian <- function(x) {
sum1 <- 0
for (value in x) {
sum1 <- sum1 + value
}
meanValue <- sum1 / length(x)
centerIndices <- ceiling(length(x) / 2)
if (length(x) %% 2 == 0) {
centerIndices <- c(
centerIndices, centerIndices + 1)
}
sum2 <- 0
for (centerIndex in centerIndices) {
sum2 <- sum2 + x[centerIndex]
}
medianValue <- sum2 / length(centerIndices)
c(meanValue, medianValue)
}According to standard indentation rules it is best to indent the body of loops; if/else statements; function definitions; and subsequent lines of a long function.
For example, in this code we could simplify the two for-loops that are just calculating a summation. For this a standard function sum() is already available.
Currently the function calculates both the mean and median, however following the SRP principle it would be better to have separate functions for the mean and median. Further we provide a function to clarify what the following piece of code “length(x) %%2 == 0” is doing.
getMean <- function(x) {
sum(x) / length(x)
}
isLengthAnEvenNumber <- function(x) {
length(x) %% 2 == 0
}
getMedian <- function(x) {
centerIndices <- ceiling(length(x) / 2)
if (isLengthAnEvenNumber(x)) {
centerIndices <- c(centerIndices, centerIndices + 1)
}
sum(x[centerIndices]) / length(centerIndices)
}The current code is already adhering well to this rule. However, one subtle change could be made as we have twice in the code a part where we have division of the sum and length.
At a minimum, proper commenting is needed before each function or code block. Further commenting within a function or code block can surely provide benefit as well for longer functions and code blocks.
# returns the mean of x
getMean <- function(x) {
sum(x) / length(x)
}
# returns TRUE if the length of x is an even number;
# FALSE otherwise
isLengthAnEvenNumber <- function(x) {
length(x) %% 2 == 0
}
# returns the median of x
getMedian <- function(x) {
centerIndices <- ceiling(length(x) / 2)
if (isLengthAnEvenNumber(x)) {
centerIndices <- c(centerIndices,
centerIndices + 1)
}
getMean(x[centerIndices])
}Making sure your code produces very helpful error message is also very important! The checkmate package is very useful for this. It has built-in functions that produce informative error messages. We present two examples in the code.
# returns the mean of x
getMean <- function(x) {
checkmate::assertNumeric(x)
sum(x) / length(x)
}
# returns TRUE if the length of x is an even number;
# FALSE otherwise
isLengthAnEvenNumber <- function(x) {
checkmate::assertVector(x)
length(x) %% 2 == 0
}
# returns the median of x
getMedian <- function(x) {
checkmate::assertNumeric(x)
centerIndices <- ceiling(length(x) / 2)
if (isLengthAnEvenNumber(x)) {
centerIndices <- c(centerIndices,
centerIndices + 1)
}
getMean(x[centerIndices])
}We will not go in detail for CCR #8 and CCR #9. Generally, it is advisable to test your code to make sure it is providing results as expected. Further, depending on the use of the code, having an independent code review is recommended.