R package development workshop
Module 3
There are 3 types of data we might want to include:
/data/R/sysdata.rda/inst/extdataThe data should be saved in /data as an .rda (or .RData) file.
usethis::use_data() will do this for you, as well as a few other necessary steps:
✔ Adding 'R' to Depends field in DESCRIPTION
✔ Creating 'data/'
✔ Setting LazyData to 'true' in 'DESCRIPTION'
✔ Saving 'letter_indices' to 'data/letter_indices.rda'
• Document your data (see 'https://r-pkgs.org/data.html')
Note
For larger datasets, you can try changing the compress argument to get the best compression.
Often the data that you want to make accessible to the users is one you have created with an R script – either from scratch or from a raw data set.
It’s a good idea to put the R script and any corresponding raw data in /data-raw.
usethis::use_data_raw("dataname") will set this up:
/data-raw/data-raw/dataname.R for you to add the code needed to create the data^data-raw$ to .Rbuildignore as it does not need to be included in the actual package.You should add any raw data files (e.g. .csv files) to /data-raw.
Datasets in /data are always exported, so must be documented.
To document a dataset, we must have an .R script in /R that contains a Roxygen block above the name of the dataset.
As with functions, you can choose how to arrange this, e.g. in one combined /R/data.R or in a separate R file for each dataset.
#' Letters of the Roman Alphabet with Indices
#'
#' A dataset of lower-case letters of the Roman alphabet and their
#' numeric index from a = 1 to z = 26.
#'
#' @format A data frame with 26 rows and 2 variables:
#' \describe{
#' \item{letter}{The letter as a character string.}
#' \item{index}{The corresponding numeric index.}
#' }
"letter_indices"
#' @ examples can be used here too.
For collected data, the (original) source should be documented with #' @source.
This should either be a url, e.g.
(alternatively \href{DiamondSearchEngine}{http://www.diamondse.info/}), or a reference, e.g.
Sometimes functions need access to reference data, e.g. constants or look-up tables, that don’t need to be shared with users.
These objects should be saved in a single R/sysdata.rda file.
This can be done with use_data(..., internal = TRUE), e.g.
The generating code and any raw data can be put in /data-raw.
As the objects are not exported, they don’t need to be documented.
Sometimes you want to include raw data, to use in examples or vignettes.
These files can be any format and should be added directly into /inst/extdata.
When the package is installed, these files will be copied to the extdata directory and their path on your system can be found as follows:
usethis::use_data_raw("farm_animals").data-raw/farm_animals.R write some code to create a small data frame with the names of farm animals and the sound they make.usethis::use_data()) to create the data and save it in /data.R/farm_animals.R script and add some roxygen comments to document the data.devtools::document() to create the documentation for the farm_animals data. Preview the documentation to check it.We build new functions one bit at a time.
What if a new thing we add changes the existing functionality?
How can we check and be sure all the old functionality still works with New Fancy Feature?
Unit Tests!
From the root of a package project:
✔ Adding 'testthat' to Suggests field in DESCRIPTION
✔ Setting Config/testthat/edition field in DESCRIPTION to '3'
✔ Creating 'tests/testthat/'
✔ Writing 'tests/testthat.R'
• Call `use_test()` to initialize a basic test file and open it for editing.
tests/testthat.R loads testthat and the package being tested, so you don’t need to add library() calls to the test files.
Test every individual task the function completes separately.
Check both for successful situations and for expected failure situations.
Three expectations cover the vast majority of cases
First, create a test file for this function, in either way:
Note
RStudio makes it really easy to swap between associated R scripts and tests.
If the R file is open, usethis::use_test() (with no arguments) opens or creates the test.
With the test file open, usethis::use_r() (with no arguments) opens or creates the R script.
desc is the test name. Should be brief and evocative, e.g. test_that("multiplication works", { ... }).code is test code containing expectations. Braces ({}) should always be used in order to get accurate location data for test failures.
In the now-created and open tests/testthat/test-animal_sounds.R script:
Tests can be run interactively like any other R code. The output will appear in the console, e.g. for a successful test:
Test passed 😀
Alternatively, we can run tests in the background with the output appearing in the build pane.
testthat::test_file() – run all tests in a file (‘Run Tests’ button)devtools::test() – run all tests in a package (Ctrl/Cmd + Shift + T, or Build > Test Package)For numeric values, expect_equal() allows some tolerance:
Note that when the expectation is met, there is nothing printed.
Use expect_identical() to test exact equivalence.
Use expect_equal(ignore_attr = TRUE) to ignore different attributes (e.g. names).
expect_error(), expect_warning()When we expect an error/warning when the code is run, we need to pass the call to expect_error()/expect_warning() directly.
One way is to expect a text outcome using a regular expression:
However, the regexp can get fiddly, especially if there are characters to escape. There is a more modern, precise way…
classWhen using cli::cli_abort() and cli::cli_warn() to throw errors and warnings, we can signal the condition with a class, which we can then use in our tests.
First, we need to modify the calls to cli::cli_abort in animal_sounds()
We can then check for this class in the test
Advantages of using class:
animal_sounds() and add the tests defined in the slides.sound argument.Sometimes it is difficult to define the expected output, e.g. to test images or output printed to the console. expect_snapshot() captures all messages, warnings, errors, and output from code.
When we expect the code to throw an error (e.g. if we want to test the appearance of an informative message), we need to specify error = TRUE.
Snapshot tests can not be run interactively by sending to the console, instead we must use devtools::test() or testthat::test_file().
Run the tests once to create the snapshot
── Warning (test-animal_sounds.R:16:3): error message for invalid input ──
Adding new snapshot:
Code
animal_sounds("dog", c("woof", "bow wow wow"))
Error <error_not_single_string>
`sound` must be a single string!
i It was a character vector of length 2 instead.
An animal_sounds.md file is created in tests/testhat/_snaps with the code and output.
Next time the tests are run the output will be compared against this snapshot.
Suppose we update an error message in animal_sounds to
When we rerun the test, we’ll get a failure:
── Failure (test-animal_sounds.R:16:3): error message for invalid input ──
Snapshot of code has changed:
old vs new
"Code"
" animal_sounds(\"dog\", c(\"woof\", \"bow wow wow\"))"
"Error <error_not_single_string>"
- " `sound` must be a single string!"
+ " `sound` must be a <character> vector of length 1!"
" i It was a character vector of length 2 instead."
* Run testthat::snapshot_accept('animal_sounds') to accept the change.
* Run testthat::snapshot_review('animal_sounds') to interactively review the change.
We can use expect_snapshot_file() to create snapshots for images. These allow us to compare binary outputs, though the can’t provide an automatic diff when the test fails. Instead, call snapshot_review() to launch a Shiny app that allows you to visually review the changes.
See Whole file snapshotting for further details.
The vdiffr package allows comparisons between SVG images.
Make this test pass
Hint: set the default value for the sound argument to NULL.
Commit your changes to the git repo.
Push your commits from this session.
Wickham, H and Bryan, J, R Packages (2nd edn, in progress), https://r-pkgs.org.
R Core Team, Writing R Extensions, https://cran.r-project.org/doc/manuals/r-release/R-exts.html
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).