Introduction


  • We can only have confidence in the results of scientific analyses if they can be reproduced by others (including your future self)
  • targets helps achieve reproducibility by automating workflow
  • targets is designed for use with the R programming language
  • The example dataset for this workshop includes measurements taken on penguins in Antarctica

First targets Workflow


  • Projects help keep our analyses organized so we can easily re-run them later
  • Use the RStudio Project Wizard to create projects
  • The _targets.R file is a special file that must be included in all targets projects, and defines the worklow
  • Use tar_script() to create a default _targets.R file
  • Use tar_make() to run the workflow

Loading Workflow Objects


  • targets workflows are run in a separate, non-interactive R session
  • tar_load() loads a workflow object into the current R session
  • tar_read() reads a workflow object and returns its value
  • The _targets folder is the cache and generally should not be edited by hand

The Workflow Lifecycle


  • targets only runs the steps that have been affected by a change to the code
  • tar_visnetwork() shows the current state of the workflow as a network
  • tar_progress() shows the current state of the workflow as a data frame
  • tar_outdated() lists outdated targets
  • tar_invalidate() can be used to invalidate (re-run) specific targets

Best Practices for targets Project Organization


  • Put code in the R/ folder
  • Put functions in R/functions.R
  • Specify packages in R/packages.R
  • Put other miscellaneous files in _targets/user
  • Writing functions is a key skill for targets pipelines

Managing Packages


  • There are multiple ways to load packages with targets
  • targets only tracks user-defined functions, not packages
  • Use renv to manage package versions
  • Use the conflicted package to manage namespace conflicts

Working with External Files


  • tarchetypes::tar_file() tracks the contents of a file
  • Use tarchetypes::tar_file_read() in combination with data loading functions like read_csv() to keep the pipeline in sync with your input data
  • Use tarchetypes::tar_file() in combination with a function that writes to a file and returns its path to write out data

Branching


  • Dynamic branching creates multiple targets with a single command
  • You usually need to write custom functions so that the output of the branches includes necessary metadata

Parallel Processing


  • Dynamic branching creates multiple targets with a single command
  • You usually need to write custom functions so that the output of the branches includes necessary metadata
  • Parallel computing works at the level of the workflow, not the function

Reproducible Reports with Quarto


  • tarchetypes::tar_quarto() is used to render Quarto documents
  • You should load targets within the Quarto document using tar_load() and tar_read()
  • It is recommended to do heavy computations in the main targets workflow, and lighter formatting and plot generation in the Quarto document

Deploying Targets on HPC


  • crew.cluster::crew_controller_slurm() is used to configure a workflow to use Slurm
  • Crew uses persistent workers on HPC, and you need to choose your resources accordingly
  • You can create heterogeneous workers by using multiple calls to crew_controller_slurm(name=)