Chapter 9 Filling out the pipeline

9.1 Defining the targets

Now that we have our functions fully written up it’s time to return to the list of targets that we drafted in the _targets.R file.

Here it is again:

list(
  
  tar_target(name = penguin_data,
             command = ),
  
  tar_file(penguin_file,
  ),
  
  tar_target(cleaned_data,
  ),
  
  tar_target(exploratory_plot,
  ),
  
  tar_target(penguin_model,
  ),
  
  tar_render(markdown_summary,
  )
  
)

Each target in our pipeline is defined by either tar_target(), tar_file(), or tar_render(). They’ll all be defined either using a function that we’ve written or some other R expression.

To finish filling out the first target, penguin_data, we can add a call to our download_penguins() function as the command and add a vector with package names needed to run the function:

tar_target(name = penguin_data,
           command = download_penguins(out_path = "example_project/data/penguin_data.csv"),
           packages = c("tidyverse", "janitor")),

Note that if you have defined one or more default packages to load for each target using tar_option_set(), then you don’t always have to include information for the packages argument. Here we need more than just tidyverse, though, so we include a vector naming the packages we need. Recall that out_path in download_penguins() is a file path for where to save the .csv with the dataset.

Next, we will provide tar_file() with the path to this .csv file, which it will track as part of our pipeline. download_penguins() returns the path as its output, so we provide penguin_data as the command to tar_file(). This links these two steps in the workflow so that penguin_data must run first and then be fed into penguin_file.

tar_file(penguin_file,
         penguin_data),

Now we progress through the rest of the targets in the workflow, filling in the necessary arguments in the functions we’ve developed for each step.

Feed penguin_file (the path to our .csv file) into the clean_dataset() function and make sure to include lubridate in its package list:

tar_target(cleaned_data,
           clean_dataset(file_path = penguin_file),
           packages = c("tidyverse", "lubridate")),

The exploratory_plot target won’t need any extra packages beyond tidyverse:

tar_target(exploratory_plot,
           plot_body_mass(cleaned_data = cleaned_data)),

Our modeling step needs a dataset and model formula specified:

tar_target(penguin_model,
           run_model(cleaned_data = cleaned_data,
                     model_string = "body_mass_g ~ flipper_length_mm * species")),

Lastly, analysis_report needs the name of the file to run to render the report:

tar_render(analysis_report,
           "R/analysis_report.Rmd")


Here is what your final _targets.R script should look like:

library(targets)
library(tarchetypes)

source("R/functions.R")

# Set target-specific options such as packages.
tar_option_set(packages = "tidyverse")

# End this file with a list of target objects.
list(
  
  tar_target(name = penguin_data,
             command = download_penguins(out_path = "data/penguin_data.csv"),
             packages = c("tidyverse", "janitor")),
  
  tar_file(penguin_file,
           penguin_data),
  
  tar_target(cleaned_data,
             clean_dataset(file_path = penguin_file),
             packages = c("tidyverse", "lubridate")),
  
  tar_target(exploratory_plot,
             plot_body_mass(cleaned_data = cleaned_data)),
  
  tar_target(penguin_model,
             run_model(cleaned_data = cleaned_data,
                       model_string = "body_mass_g ~ flipper_length_mm * species")),
  
  tar_render(analysis_report,
             "R/analysis_report.Rmd")
  
)


9.2 Check the pipeline

Now that we have the pipeline completely outlined we can visualize its connections to make sure that the logic of our target relationships is complete and correct. To do this we use tar_visnetwork().

tar_visnetwork()

If everything is correct you should have something like the interactive network plot above. This is a very useful tool provided by targets. It allows you to make sure that all the connections between steps in your workflow are accurate and nothing has been dropped. For example, notice that analysis_report is not currently connected to anything. Usually this would be a red flag for us, but in this instance it’s ok because we haven’t written the code for the report yet. Once we’ve done that it’ll be connected to the other targets in the workflow.

In the next section we’ll fill out the analysis report using the targets in the pipeline.