Introduction to targets: Key Points

Pre-Alpha

Introduction to targets

Introduction

We can only have confidence in the results of scientific analyses if they can be reproduced by others (including your future self)
targets helps achieve reproducibility by automating workflow
targets is designed for use with the R programming language
The example dataset for this workshop includes measurements taken on penguins in Antarctica

First targets Workflow

Projects help keep our analyses organized so we can easily re-run them later
Use the RStudio Project Wizard to create projects
The _targets.R file is a special file that must be included in all targets projects, and defines the worklow
Use tar_script() to create a default _targets.R file
Use tar_make() to run the workflow

Loading Workflow Objects

targets workflows are run in a separate, non-interactive R session
tar_load() loads a workflow object into the current R session
tar_read() reads a workflow object and returns its value
The _targets folder is the cache and generally should not be edited by hand

The Workflow Lifecycle

targets only runs the steps that have been affected by a change to the code
tar_visnetwork() shows the current state of the workflow as a network
tar_progress() shows the current state of the workflow as a data frame
tar_outdated() lists outdated targets
tar_invalidate() can be used to invalidate (re-run) specific targets

Best Practices for targets Project Organization

Put code in the R/ folder
Put functions in R/functions.R
Specify packages in R/packages.R
Put other miscellaneous files in _targets/user
Writing functions is a key skill for targets pipelines

Managing Packages

There are multiple ways to load packages with targets
targets only tracks user-defined functions, not packages
Use renv to manage package versions
Use the conflicted package to manage namespace conflicts

Working with External Files

tarchetypes::tar_file() tracks the contents of a file
Use tarchetypes::tar_file_read() in combination with data loading functions like read_csv() to keep the pipeline in sync with your input data
Use tarchetypes::tar_file() in combination with a function that writes to a file and returns its path to write out data

Branching

Dynamic branching creates multiple targets with a single command
You usually need to write custom functions so that the output of the branches includes necessary metadata

Parallel Processing

Dynamic branching creates multiple targets with a single command
You usually need to write custom functions so that the output of the branches includes necessary metadata
Parallel computing works at the level of the workflow, not the function

Reproducible Reports with Quarto

tarchetypes::tar_quarto() is used to render Quarto documents
You should load targets within the Quarto document using tar_load() and tar_read()
It is recommended to do heavy computations in the main targets workflow, and lighter formatting and plot generation in the Quarto document

Deploying Targets on HPC

crew.cluster::crew_controller_slurm() is used to configure a workflow to use Slurm
Crew uses persistent workers on HPC, and you need to choose your resources accordingly
You can create heterogeneous workers by using multiple calls to crew_controller_slurm(name=)