Hey,
This tiny tutorial is to help you get started with creating packages in R.
R packages are a great way to share and reuse code across projects, with other colleges (and the rest of the R community). They provide much better organization over defining functions in scripts and they facilitate reproducibility, usability and collaboration, which are pivotal in data analysis.
In this tutorial I’ll show you how to: create a basic package, structure code and functions, write comprehensive documentation and automated tests, name and license our package. You can find a lot more info in the R Packages available for free online.
Projects are nice because they allow you to work with relative file paths (no need to specify working directories or absolute paths).
In RStudio, you can do: New project > New Directory > R
Package (using devtools
). Choose the location and
package name and select Create git repository to enable
version control. You can also use renv
to manage your
package version independently of your R installation. And just hit
Create Project.
A few key considerations when naming an R package:
_
is not a valid character in CRAN
package names)Is is a good idea to first check if the name you chose for your
package is not already used elsewhere. The available
package checks for existing packages (e.g. on CRAN, Bioconductor, or
GitHub) and that the name you chose doesn’t have some unintended meaning
(i.e. urban dictionary).
library(available)
available("package.name")
Hitting Create Project will open a new session and create the basic folder structure of an R package. This includes:
R
directory, containing the
.R
files holding all your functions.man
(“manual”) directory,
containing the documentation (or help) .Rd
files for each
function (or object).package-name.Rproj
, a metadata file created by
RStudio to manage the project..Rbuildignore
file specifying all files that may be
in the folder but should not be included in the build.All R packages also include at least two supporting files called:
DESCRIPTION
: with essential package metadata,
e.g. the package name, version, dependencies,
author and coauthors, maintainer etc.
NAMESPACE
: with information about functions/objects
that your package imports from other packages and
functions/objects that you make available to users, ensuring seamless
integration with other packages. Both are generated along with the
documentation using the roxygen2
package.
In RStudio, you should now see a new tab called Build
in
the environment window. Click on Build > Install to
install the package. It will take a couple of seconds before you see the
familiar: library(package.name)
popping up in the command
line.
The package is now loaded.
Note: the environment is still empty. None of the functions you define in the package are visible the environment.
Try this out:
hello() # runs mock function
?hello # opens documentation
# Load devtools
library(devtools)
devtools
contains a variety of helpful functions for
package development. This also loads the usethis
package,
which provides an additional set of very convenient functions for R
projects in general.
P.S. you can also create and install the package directly from the command line using:
# usethis::create_package("package.name")
# devtools::install()
Which will create the necessary folders/files described above.
If you need to use functions from other packages (for example,
foreign
to read .sav
files) you don’t load
them using library()
. Rather:
usethis::use_package("foreign")
# Note: you need to have the `foreign` installed for this to work
# Or, to avoid version issues...
usethis::use_package("foreign", min_version = "0.8.1")
By default, use_package()
will add an
“Imports” specification to the DESCRIPTION
file.
You can add a new function to a package in a couple of ways:
# 1. create a .R script in R folder and write the function in there directly
usethis::use_r("function_name")
# 2. dump a function you already defined like
dump("function_name", file="R/function_name.R")
During the development phase, it can be helpful to load all the package files into the current R session, so you can test and debug the new function(s) (prior to installation).
devtools::load_all(".")
Or click on Build > More > Load all (shortcut: ctrl-up-L).
.R
file. You may store a variety of smaller
functions that you use throughout in a utils.R
file. It’s
also best practice to include an R file with the package’s name to help
understand the package.::
.roxygen2
Documenting takes time but is also a fundamental element of a good R
package. Function help files can be generated using the
roxygen2
package.
In RStudio, place your cursor inside the function you want to
document and go to Code > Insert Roxygen Skeleton
(shortcut: ctrl-opt-up-R). In the function’s .R file (above the function
definition) you should now see the roxygen2
header (of
which every line starts with #'
).
Help files follow a specific structure that is set by CRAN, including:
@param
tag followed by all parameter names is included
automatically. It is helpful to start with the type of the
argument (e.g., numerical, string, logical…).@returns
). Also start with the output type
here.@seealso
), more useful for more specialized analyses that
need citations.Add the @export
statement before the examples to make
the function accessible to external users.
When functions take similar arguments, they can be included together in one help file.
After you have filled in the roxygen header you can add documentation to the package:
# check that the package can be loaded, update the NAMESPACE file, and generate the .Rd file
roxygen2::roxygenize()
# or
devtools::document(roclets = c('rd', 'collate', 'namespace'))
# or, if you are tired of typing
# In RStudio: **Build > More > Document** (shortcut: ctrl-up-D)
Now the man
folder will include a new .Rd
documentation file. Load the function again and check its documentation
with ?function_name
R packages can also include R Markdown templates, stored in the inst directory. These help provide a structure for working with the package.
# Start an R Markdown template
usethis::use_rmarkdown_template("Tutorial")
use_rmarkdown_template()
creates a file for you to
customize and the directory structure
inst/rmarkdown/templates/.
While data isn’t a core component of an R package, you may want to add data objects (for example dataframes) to ease the testing and demonstration of functions or other reasons. Data is often necessary for running examples, ensuring the package works as intended.
You can do that by:
# Store in an R object
toy_data <- data.frame(
'id' = seq(1:5),
'f0101'= sample(1:50, 5, replace=TRUE),
'f0102'= sample(0:1, 5, replace=TRUE)
)
# Add object to the package
usethis::use_data(toy_data)
use_data()
makes a data directory if
there isn’t any, and saves the R data object with the .rda
extension. It’s best practice to place all external data
(e.g. .csv
files) in the data-raw
directory.
A very handy function to test whether the package is working is:
devtools::check()
This performs a comprehensive series of tests (including syntax checks, package dependencies, documentation quality, coding standards…) to ensure correctness and adherence to package development standards.
If you get 0 errors ✔ | 0 warnings ✔ | 0 notes ✔
,
congrats, your package meets the requirements and standards set by the R
community and you are ready to share it with the world.
We can use the devtools package to test your package.
Dependable R packages also need unit tests. Unit tests are automated tests that check individual functions or pieces of code to ensure they produce the expected output under various conditions.
Moreover, it is best practice to include vignettes, which are long-form tutorials demonstrating how to use the package. All of this is included in the package in a standardized structure, providing users with a reliable and well-organized way to access and use the package’s content.
eet’s understand three different types of dependencies in R packages. They are , ensuring that all the necessary functions are available.
Besides Imports (i.e., packages that are required for your functions to work properly and are automatically loaded when the package is loaded) there are another types of dependencies you can add to the DESCRIPTION file. Suggests are packages that are not required for the package to function, but they provide additional functionality or examples. Using Suggests is a courtesy to users, avoiding downloading difficult-to-install packages.
usethis::use_package("tibble", type = "Suggests")
When you are ready to share our package, you also need to think about how others can use the code. Two commonly used licenses are:
# Set the license in the DESCIPTION file
usethis::use_mit_license()
# usethis::use_cc0_license()