So far, you’ve learned the tools to get your data into R, tidy it into a form convenient for analysis, and then understand your data through transformation, visualization and modelling. However, it doesn’t matter how great your analysis is unless you can explain it to others: you need to communicate your results.
R Markdown provides an authoring framework for data science.
You can use a single R Markdown file to both
- save and execute code, and
- generate high quality reports that can be shared with an audience.
R Markdown was designed for easier reproducibility, since both the computing code and narratives are in the same document, and results are automatically generated from the source code. R Markdown supports dozens of static and dynamic/interactive output formats.
There are three basic components of an R Markdown document: the metadata, text, and code. The metadata is written between the pair of three dashes —. The syntax for the metadata is YAML (YAML Ain’t Markup Language, https://en.wikipedia.org/wiki/YAML), so sometimes it is also called the YAML metadata or the YAML frontmatter.
The body of a document follows the metadata. The syntax for text (also known as prose or narratives) is Markdown.
Open a new .Rmd file in the RStudio IDE by going to File > New File > R Markdown.
The usual way to compile an R Markdown document is to click the Knit button as shown in Figure 2.1, and the corresponding keyboard shortcut is Ctrl + Shift + K (Cmd + Shift + K on macOS). Under the hood, RStudio calls the function rmarkdown::render() to render the document in a new R session. Please note the emphasis here, which often confuses R Markdown users. Rendering an Rmd document in a new R session means that none of the objects in your current R session (e.g., those you created in your R console) are available to that session. Reproducibility is the main reason that RStudio uses a new R session to render your Rmd documents: in most cases, you may want your documents to continue to work the next time you open R, or in other people’s computing environments.
If you must render a document in the current R session, you can also call rmarkdown::render() by yourself, and pass the path of the Rmd file to this function. The second argument of this function is the output format, which defaults to the first output format you specify in the YAML metadata (if it is missing, the default is html_document).
There are two types of output formats in the rmarkdown package: documents, and presentations.
To master RMarkdown follow RMarkdown the Definitive Guide
If you prefer a video introduction to R Markdown, I recommend that you check out the website https://rmarkdown.rstudio.com, and watch the videos in the “Get Started” section, which cover the basics of R Markdown.
To create a header to an RMarkdown document you can #
.
The number of #
used represents the level of the
header.
Text can simply be add writing normally as a text document.
Bold text is added with **some text**
and italic text with *some text*
.
One can also adapts the text to the output of an R command, such as ‘nrow(cars)’ and ‘ncol(cars)’. To include the output of an R command within the text, use backticks and r. For example… the iris dataset contains 150 rows and 5 columns.
Chunks are the places where code is used and they are created with the following syntax
```{r chunk_name} # some code ```
Chunks can be inserted quickly using the keyboard shortcut Ctrl + Alt + I (macOS: Cmd + Option + I), or via the Insert menu in the editor toolbar.
Each chunk has a name. The name must be in the form ‘r + some text’, for instance ‘r data’. Within a chunk you can write R commands.
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Chunks have options that controls how r commands and outputs are
displayed in the html document. The are few fundamental options that are
always used:
- eval
: whether to evaluate a code chunk.
- echo
: whether to echo the source code in the output
document (someone may not prefer reading your smart source code but only
results).
- include
: whether to include anything from a code chunk in
the output document.
- warning
, message
, and error
:
whether to show warnings, messages, and errors in the output
document.
- fig.width
and fig.height
: the (graphical
device) size of R plots in inches.
- out.width
and out.height
: the output size of
R plots in the output document.
- fig.align
: the alignment of plots. It can be ‘left’,
‘center’, or ‘right’.
- fig.cap
: the figure caption.
By default, figures produced by R code will be placed immediately after the code chunk they were generated from.
plot(iris$Sepal.Length, iris$Petal.Length, xlab = "Sepal Lenght", ylab = "Petal Lenght")
One can simply perform statistical analysis within code chunks.
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
Math expression can be inserted by surrounding text into $.
I now want to fit the linear regression model to my data of the form
\[Y = \beta_0 + \beta_1 X + \varepsilon\]
To do this, I simply use the function lm
.
model <- lm(Petal.Length ~ Sepal.Length, data = iris)
The model output is the following:
summary(model)
##
## Call:
## lm(formula = Petal.Length ~ Sepal.Length, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.47747 -0.59072 -0.00668 0.60484 2.49512
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.10144 0.50666 -14.02 <2e-16 ***
## Sepal.Length 1.85843 0.08586 21.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8678 on 148 degrees of freedom
## Multiple R-squared: 0.76, Adjusted R-squared: 0.7583
## F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16
We see that \(\hat{\beta}_0\) is -7.1 while the slope is equal to 1.86.
Comments
To add comments to an RMarkdown document you can use
<!-- some text -->
, as in html.