Chapter 4 Pipeline structure

4.1 What do you need to get started with `targets`?

Some of the basic ingredients for a targets pipeline include:

A _targets.R script
A functions.R script

Example targets directory structure

First, let’s talk about the _targets script.

The _targets.R script contains several components, but the most unique is a list of “targets”. Targets are analysis steps. This list of targets, which are defined using the tar_target() function, dictates what steps are run in your workflow and which other parts of the workflow they depend on. We’ll write up a _targets.R file in a later section, but for now keep this information in mind. If you’d like, you can read more about this file and what it requires here.

Generally speaking, a target list has this conceptual structure:

There are different ways to approach filling in your targets, but below is one example of what a very basic list might look like:

Notice that each successive target is referencing an earlier target, which helps to define the connections between steps in the workflow.

4.2 How to craft a target?

A target is:

A meaningful step in your workflow
A large enough process to take time to run
Not so large that it can’t be skipped often
Compatible with saveRDS()

Note: If you have a big dataset, you are better off with one or two steps that require the full dataset, instead of many steps that require a lot of memory. (i.e., step 1: clean, step 2: model)

These guidelines were drawn from the targets and drake manuals.

4.3 `targets` depends on functions

The targets package is built around the idea of using functions to define your targets, or steps in the workflow. This enables the list of your targets to stay simple (i.e., just names and function calls) and easy to read. The functions can then be defined in another script, functions.R, which is often stored in an R/ folder (as above).

4.4 Function anatomy

You might not be familiar with functions yet, so let’s talk more about them.

Functions save code for easy re-use later on, and give a name to that code (e.g., mean()) that you can reference in the future. Functions also have inputs and outputs.

The basic anatomy of a function is as follows: