Introduction to targets
- We can only have confidence in the results of scientific analyses if
they can be reproduced by others (including your future self)
-
targets
helps achieve reproducibility by automating
workflow
-
targets
is designed for use with the R programming
language
- The example dataset for this workshop includes measurements taken on
penguins in Antarctica
- Projects help keep our analyses organized so we can easily re-run
them later
- Use the RStudio Project Wizard to create projects
- The
_targets.R
file is a special file that must be
included in all targets
projects, and defines the
worklow
- Use
tar_script()
to create a default
_targets.R
file
- Use
tar_make()
to run the workflow
-
targets
workflows are run in a separate,
non-interactive R session
-
tar_load()
loads a workflow object into the current R
session
-
tar_read()
reads a workflow object and returns its
value
- The
_targets
folder is the cache and generally should
not be edited by hand
-
targets
only runs the steps that have been affected by
a change to the code
-
tar_visnetwork()
shows the current state of the
workflow as a network
-
tar_progress()
shows the current state of the workflow
as a data frame
-
tar_outdated()
lists outdated targets
-
tar_invalidate()
can be used to invalidate (re-run)
specific targets
- Put code in the
R/
folder
- Put functions in
R/functions.R
- Specify packages in
R/packages.R
- Put other miscellaneous files in
_targets/user
- Writing functions is a key skill for
targets
pipelines
- There are multiple ways to load packages with
targets
-
targets
only tracks user-defined functions, not
packages
- Use
renv
to manage package versions
- Use the
conflicted
package to manage namespace
conflicts
-
tarchetypes::tar_file()
tracks the contents of a
file
- Use
tarchetypes::tar_file_read()
in combination with
data loading functions like read_csv()
to keep the pipeline
in sync with your input data
- Use
tarchetypes::tar_file()
in combination with a
function that writes to a file and returns its path to write out
data
- Dynamic branching creates multiple targets with a single
command
- You usually need to write custom functions so that the output of the
branches includes necessary metadata
- Dynamic branching creates multiple targets with a single
command
- You usually need to write custom functions so that the output of the
branches includes necessary metadata
- Parallel computing works at the level of the workflow, not the
function
-
tarchetypes::tar_quarto()
is used to render Quarto
documents
- You should load targets within the Quarto document using
tar_load()
and tar_read()
- It is recommended to do heavy computations in the main targets
workflow, and lighter formatting and plot generation in the Quarto
document
-
crew.cluster::crew_controller_slurm()
is used to
configure a workflow to use Slurm
- Crew uses persistent workers on HPC, and you need to choose your
resources accordingly
- You can create heterogeneous workers by using multiple calls to
crew_controller_slurm(name=)