💯 R Best Practices

How to set up a reproducible and collaborative workflow

Index

  • 🚧 Project Configuration
  • ✨ Tips and Tricks
  • 🌐 Follow Standards

🚧 Project Configuration

Code smells, projects and scaffolding

🚧 Project Configuration

🚩 Dont’s

rm(list = ls())

Important

This does not reset the workspace: all it does is delete user-created objects from the global workspace.

• Any packages that have ever been attached via library() are still available.
• Any options that have been set to non-default values remain that way.
• Working directory is not affected.

🚧 Project Configuration

✔️ Do’s

  • You should always reason in terms of sessions, not workspace: don’t be afraid to use Restart R Session (shift + cmd + 0).

Warning

This effectively means that you have to re-run your code! But you can also use caching with RMarkdown and Quarto notebooks.

  • Write your code in R scripts/notebooks, avoid using the console and save your progress often!

🚧 Project Configuration

✔️ Do’s

• When you quit R, do not save the workspace to an .Rdata file.

• When you launch, do not reload the workspace from an .Rdata file.

• In RStudio, set this via Preferences > Options > General.

🚧 Project Configuration

🚩 Dont’s

setwd("/Users/jenny/cuddly_broccoli/verbose/...")

  • The chance of the setwd() command having the desired effect – making the file paths work – for anyone besides its author is 0%.
  • It’s also unlikely to work for the author one or two years or computers from now.
  • Hard-wired, absolute paths, especially when sprinkled throughout the code, make a project brittle.

🚧 Project Configuration

✔️ Use Projects

• Create a project with the top-left icon next to New file, or from the Command Palette.

• Open and manage Projects from the top-right drop-down menu.

• You can also open projects by double clicking on the *.Rproj file in your filesystem.

🚧 Project Configuration

✔️ Use {here}

library(ggplot2)
library(here)

# reads from `./data/raw_foofy_data.csv`
df <- read.delim(here("data", "raw_foofy_data.csv"))

p <- ggplot(df, aes(x, y)) + geom_point()

# saves to `./figs/foofy_scatterplot.png`
ggsave(here("figs", "foofy_scatterplot.png"))

🚧 Project Configuration

🎁 Extra

  • Give meaningful names to variables. Stop saving each dataframe as df.

“There are only two hard things in Computer Science: cache invalidation and naming things.”

✨ Tips and Tricks

Filenames, commands and documentation

✨Tips and Tricks

📁 File names should be

  • machine readable: no spaces, no accents, no punctuation, no special characters, all lowercase.
  • human readable.
  • play well with default ordering.

Warning

Avoid / and \ for file names especially!

✨Tips and Tricks

📁 File names should be

  • Use _ to delimit words.
  • Use - to delimit meta-data fields.
  • Use dates and numbering to enforce ordering.

Example

2022_11_05-lecture_01-r_best_practices

✨Tips and Tricks

📁 File names should be

Example

01-helper-data-loading.R

02-helper-data-visualisation.R

03-helper-ml-model_tuning.R

  • Split the code across different scripts, rather than maintaining an expensive monolith. In this way, you will only have to re-run the parts you actually need.

✨Tips and Tricks

📁 Project structure

  • If the number of scripts grows, create sub-directories using the same naming criteria.
  • There are many ways to structure a directory tree. For simplicity, you might want to start with:
      .
      ├── data
      │  ├── external       # external data that does not belong to raw
      │  ├── interim        # intermediate manipulations
      │  ├── processed      # final data used for analysis/models
      │  └── raw            # raw data should never change!
      ├── reports           # notebooks with analysis and exploration
      ├── src               # contains the source code, also named `R`
      ├── your-proj.Rproj   # ❗Project file
      └── README.md         # info about the project

✨Tips and Tricks

📁 Project structure

  • Or… at least
      .
      ├── data
      ├── R                 # contains the R code
      ├── your-proj.Rproj   # ❗Project file
      └── README.md         # info about the project

✨Tips and Tricks

⚙️ Commands

  • cmd + shift + P calls the command palette: from there, you can call any command.
  • Get help about any function:
help('as.data.frame')
?function
??function # if you do not recall the package

✨Tips and Tricks

📚 Documentation

A vignette is like a book chapter or an academic paper: it can describe the problem that your package is designed to solve, and then show the reader how to solve it.

  • You can see all the installed vignettes with browseVignettes() and view one with vignette('your-vignette').

🌐Follow Standards

🌐Follow Standards

🔍 Linting

  • Use {lintr} as a static analysis tool:

It checks for adherence to a given style, identifying syntax errors and possible semantic issues, then reports them to you so you can take action.

🌐Follow Standards

🪄 Formatters

Thank you!