library(tidyverse)
library(cicalc)Overview
This session introduced practical debugging in R, with examples based on common mistakes in data analysis code and tools available in RStudio.
Much of the content is based on the debugging chapter of Advanced R by Hadley Wickham.
Setup
We will use tidyverse for data manipulation and cicalc for a package function that we can step through.
Reading Error Messages
The first step in debugging is to read the error message carefully.
In this example, mtcars has a column called cyl, not Cyl. R is case sensitive, so these are different names.
mtcars |>
filter(Cyl == 4)Error in `filter()`:
ℹ In argument: `Cyl == 4`.
Caused by error:
! object 'Cyl' not found
The error points us toward the problem: Cyl cannot be found. The fix is to use the actual column name.
mtcars |>
filter(cyl == 4) mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Objects and Column Names
Sometimes we store a column name in an object.
new_obj <- "cyl"
new_obj[1] "cyl"
Inside filter(), writing new_obj == 4 compares the string "cyl" to 4; it does not automatically look up the cyl column.
wrong_filter <- mtcars |>
filter(new_obj == 4)
nrow(wrong_filter)[1] 0
This is an issue rather than a loud error: the code runs, but it returns no rows because "cyl" == 4 is FALSE.
If the stored column name is wrong, .data[[...]] gives a clearer error.
wrong_obj <- "Cyl"
mtcars |>
filter(.data[[wrong_obj]] == 4)Error in `filter()`:
ℹ In argument: `.data[["Cyl"]] == 4`.
Caused by error in `.data[["Cyl"]]`:
! Column `Cyl` not found in `.data`.
The fix is to use the stored column name explicitly and make sure the name matches the data.
mtcars |>
filter(.data[[new_obj]] == 4) mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
The .data[[...]] form is useful when a column name is stored as text.
Reading Errors With Many Steps
Errors often appear at the end of a pipe, but the cause may be earlier.
When debugging a pipeline, run it one line at a time and check the output after each step.
Here the final step has a typo: vroom_indx is missing an e.
mtcars |>
filter(cyl == 4) |>
mutate(vroom_index = hp * drat / wt) |>
summarise(mean_vroom_index = mean(vroom_indx))Error in `summarise()`:
ℹ In argument: `mean_vroom_index = mean(vroom_indx)`.
Caused by error:
! object 'vroom_indx' not found
A useful approach is to build the pipe gradually.
mtcars |>
filter(cyl == 4) mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
mtcars |>
filter(cyl == 4) |>
mutate(vroom_index = hp * drat / wt) mpg cyl disp hp drat wt qsec vs am gear carb vroom_index
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 154.33190
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 71.71787
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 118.22222
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 122.40000
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 158.73684
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 149.48229
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 145.59838
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 139.16279
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 188.37850
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 281.56642
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 161.14748
The intermediate data contains vroom_index, so the problem is probably in the last line. Re-run the last step with the typo to reproduce the error in a smaller example.
mtcars |>
filter(cyl == 4) |>
mutate(vroom_index = hp * drat / wt) |>
summarise(mean_vroom_index = mean(vroom_indx))Error in `summarise()`:
ℹ In argument: `mean_vroom_index = mean(vroom_indx)`.
Caused by error:
! object 'vroom_indx' not found
Now fix the spelling in summarise().
mtcars |>
filter(cyl == 4) |>
mutate(vroom_index = hp * drat / wt) |>
summarise(mean_vroom_index = mean(vroom_index)) mean_vroom_index
1 153.7041
The error appears in the final line, but the step-by-step checks show that the earlier filter() and mutate() calls worked. That narrows the problem to the object name used inside summarise().
Debugging My Own Code
Small functions are easier to debug because each line has a clear purpose.
Here is a function with a mistake. The loop tries to use step, but step has not been created anywhere.
my_function <- function(x) {
while (x < 10) {
x <- x + step
}
x
}Testing the function gives us a visible error.
my_function(5)Error in `x + step`:
! non-numeric argument to binary operator
Useful questions include:
- What is the value of
xwhen the function starts? - Where should
stepcome from? - Does the
whilecondition ever becomeFALSE? - What value is returned at the end?
For interactive debugging, browser() can pause the function while it is running.
my_function <- function(x) {
browser()
while (x < 10) {
x <- x + step
}
x
}
my_function(5)The fix is to define step, either inside the function or as an argument.
my_function <- function(x, step = 1) {
while (x < 10) {
x <- x + step
}
x
}
my_function(5)[1] 10
Debugging Other People’s Code
Sometimes the problem is inside a function from a package. We can still inspect what is happening.
test <- rbinom(50, 1, 0.6)
ci_prop_wald(test)
── Wald Confidence Interval without Continuity Correction ──────────────────────
• 27 responses out of 50
• Estimate: 0.54
• 95% Confidence Interval:
(0.4019, 0.6781)
Now create an input that is not appropriate for a proportion confidence interval function.
bad_test <- c("success", "failure", "success")
ci_prop_wald(bad_test)Error in `ci_prop_wald()`:
! Expecting `x` to be either <logical> or <numeric/integer> coded as 0
and 1.
That error tells us the package function received something it could not calculate with. If the error message is not enough, use debugonce() when you want to step through a function the next time it is called.
debugonce(ci_prop_wald)
ci_prop_wald(bad_test)debugonce() is interactive. In a rendered Quarto document it will not print a useful walkthrough. Run it in the console, then use the browser controls to step through the function.
Use debug() when you want debugging to stay on for every future call.
debug(ci_prop_wald)
ci_prop_wald(bad_test)
undebug(ci_prop_wald)Use undebug() when you are finished, otherwise R will keep opening the debugger whenever that function is called.
You can also debug functions from common packages, although this can be noisy because widely used functions may call many other helper functions.
debug(mutate)
mtcars |>
mutate(vroom_index = hp * drat / wt)
undebug(mutate)Debugging Workflow
A practical debugging workflow is:
- Reproduce the problem
- Read the full error message
- Identify where the error occurs
- Simplify the code until the smallest failing example remains
- Inspect objects and assumptions
- Fix one thing at a time
- Re-run the code to confirm the fix
Useful RStudio Tools
- Console: test small pieces of code
- Environment pane: inspect objects and values
- Traceback: see the sequence of function calls that led to an error
- Breakpoints: pause code at a chosen line
- Debug mode: step through code one line at a time