We recommend you join the NISRA MS Teams Channel - R Software Discussion Forum (contact Marti Jefferies to get added).
The code has been developed in r version 4.2.0 or higher.
When closing RStudio it asks, “Save workspace image”. The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions). At the end of an R session, the user can save an image of the current workspace that is automatically reloaded the next time R is started. There is NO need to save the workspace as you will clear the environment before you run any new code. For large R processes it can take time to save/open.
R packages are a set of predefined functions as a library to be used while deploying the R program. R packages are externally developed and can be imported to the R environment in order to use the available function which belongs to that package.
# Read in libraries for analysis
if(!require(pacman)) install.packages("pacman")
library(pacman)
p_load("plyr", "kableExtra", "plotly", "foreign", "tidyverse", "english", "openxlsx", "furrr", "base64enc")
pacman is a package manager library for R. There are three steps required for reading a package into R:
Fortunately pacman has a command named p_load() which performs these three steps in one instance.
For example, running
p_load("plyr")
will perform the exact same operation as running
if(!require(plyr)) {install.packages("plyr")}
library(plyr)
p_load() can handle multiple packages at once, which we have used above to read in all our packages needed for analysis.
Package Name | What it is used for |
---|---|
base64enc | Encodes and embeds image files in HTML |
docxtractr | Reshaping data |
DT | A wrapper of the JavaScript Library ‘DataTables’ |
english | Converts numbers to their English language written form eg, 4 = four |
foreign | Essential for reading and writing SPSS data in R |
furrr | Mapping functions |
here | here - here sets the location for the relative paths to be location of current rproj file. |
htmltools | Tools for HTML generation and output |
httr | |
janitor | Creating Excel workbooks |
kableExtra | kableExtra - package for producing pretty tables |
lubridate | lubridate - used for create date formats |
magrittr | Provides a pipe operator for chaining commands together |
openxlsx | openxlsx - writes xlsx files. Version 4.2.5 or greater needed for this report |
plotly | Creates interactive web graphics |
plyr | Contains tool for splitting, applying and combining data |
readODS | Reading data from ODS sources |
readr | readr package is used to read in different data formats |
readxl | readxl - package for reading in excel files |
s2 | Spherical geometry support |
sf | Support for simple map features |
shiny | shiny |
stringr | stringr - package for string manipulation |
tidyr | tidyr is a package to help manipulate data into tidey dataframes |
tidyverse | tidyverse - Tools for splitting, applying and combining data |
tmap | Produces interactive maps |
xfun | xfun package is used to create base64 values for images and datasets |
zoo | zoo - used for rolling totals |
A full manual for each is available directly from the Help pane in RStudio.
By using an R Project (.Rproj) file we ensure that the current directory is also the working directory.
For example, if I wanted to refer to the file Table1.RDS
located in
C:/Users/1234567/My Documents/Excel Tables/DataTo2018
and
my .Rproj file is located in
C:/Users/1234567/My Documents/Excel Tables
I need only
start my path with Data2018/Table1.RDS
.
Please also note that in R folder paths are defined with forward slashes (/) where Windows tends to use back slashes (\)
There are multiple ways to run R code in a script. To run a single line of code, do one of the following:
To run multiple lines of code, do one of the following:
Renv is a tool that supports reproducibility in R projects. See their website for more details.
Renv is a tool that supports reproducibility in R projects. See their website for more details. Renv supports the reproducibility of projects both over time (so a project from 2 years ago is more likely to work because you can get the exact packages it was built with) and between users (if you have code problem and your colleague doesn’t, you can usually rule out package version differences as the cause).
Renv supports the reproducibility of projects both over time (so a project from 2 years ago is more likely to work because you can get the exact packages it was built with) and between users (if you have code problem and your colleague doesn’t, you can usually rule out package version differences as the cause). It also helped us solve problems with repository issues that were presenting randomly within and between users.
No. renv introduces issues itself, but we feel these are more
manageable and easier to pinpoint than alternatives. The main issue we
have found is when a package updates and a user is forced to build an
older version from source - with sf
/terra
in
particular this requires updates to Rtools (a program installed
alongside R to help building packages). Newer versions of Rtools seem to
work with newer packages, but break building very old versions of
packages. We’re still exploring this area.
rap-skeleton.Rproj
)source("renv/activate.R")
options(renv.config.mran.enabled = FALSE)
(it won’t return
any results if it works)renv::restore()
and then press ‘y’ to acceptsource("renv/activate.R")
options(renv.config.mran.enabled = FALSE)
renv::restore()
and press ‘y’ to acceptterra
sf
-lblosc
-lkea
-lsz1
related rspatial packages might
give errors while building from source. Go to CRAN
and find out the latest version[^1] of the troublesome package. Tell
renv to use that version using renvv::record()
. For
example, if the latest version of sf is 1.0-12
you would
run renv::record("sf@1.0-12")
and then run
renv::restore()
mran
aws
related
e.g. failed to retrieve 'https://rstudio-buildtools.s3.amazonaws.com/renv/mran/packages.rds' [error code 22]
,
re-run options(renv.config.mran.enabled = FALSE)
(sometimes
it re-enables itself)
curl
errors / IT security team contact
When running renv, IT may contact you regarding curl
being run on your machine. Reference i305810 is related.
Note: If a package has been updated recently, CRAN might not be
serving the binary - drop the release version down 1 notch until you can
get renv to give a binary of that package. For example, on 19/03/2023,
sf
was updated to 1.0-12
. The renv.lock file
asks for 1.0-10 but it is only available from source.
1.0-12
is the latest version, but the
r-release
(i.e. the current stable version of R (4.2 at the
moment)) binary is listed as 1.0-.11
. Therefore, we need to
ask renv for that version with renv::record("sf@1.0-11")
.
CRAN might start serving 1.0-12
as the
r-release
binary in the future.
Some useful KeyBoard shortcuts
Insert the pipe operator %>% with Ctrl + Shift + M
The select function in R is used to select variables (columns) in R
using Dplyr package. It can be used to create a subset of variables or
to drop variables e.g. select(-variablename)
will remove
the variable name from the dataframe/query.
The rm() function can be used to remove items from the environment.
here::here() figures out the top-level of your current project.
The pull() function is used in the datset universals quite frequently. It extracts a value out of a data frame to a text or numeric value.
The t function in R allows a matrix (or dataframe) to be transposed.
The process of creating the Excel document involves writing out the
information to the correct sheet and then formatting that information.
The Styles.R
file within the Functions
folder
of the code shows what different text styles are available to add to our
tables or text. If more styles are needed in the future then these would
be created in the Styles.R
file. The process for writing
and formatting is similar for both text and tables.
Here are some of the main commands needed for this process:
writeDataTable
writeDataTable(new_workbook, sheet = "Table 2",
x = tab2_df,
startRow = 6,
startCol = 1,
colNames = TRUE,
tableStyle = "none",
tableName = "Table_2",
withFilter = FALSE,
bandedRows = FALSE,
keepNA = TRUE,
na.string = suppressed,
headerStyle = h3)
writeDataTable
is the function used to write out the
dataframes we created in the data_prep
file.
The first two arguments of this function are the name of the workbook
(new_workbook
) and the name of the sheet (“Table 2”) that
we want to write on.
x
refers to the dataframe from the
data_prep
file that we want to insert e.g
tab2_df
.
startRow
and startCol
are the co-ordinates
of where we want the dataframe to be place e.g row 6 and column 1.
keepNA
is required for tables that have NA values in
them that were created for suppression. If this wasn’t selected then any
NA values would just come out as blank cells.
na.string = suppressed
causes any NA values in the
dataframe to be equal to the variable suppressed
that was
created earlier. This means that an x
will be inserted in
any NA values that were created fo suppression.
headerStyle = h3
sets the header style to h3 which is
available to view in the Styles.R
file.
setColWidths
setColWidths(new_workbook, sheet = "Table 2", cols = 1, widths = 16)
setColWidths(new_workbook, sheet = "Table 2", cols = c(2:10), widths = 18)
setColWidths
is a function that sets the width of any
Excel column you select. This has to be set for all the columns in the
tables in order for column headers to fit correctly etc.
Select the required workbook (new_workbook
), required
sheet name (“Table 2”), required column (either a single number or a
range) and the width (numeric value).
addStyle
addStyle(new_workbook, sheet = "Table 2",
style = tw, rows = 6, cols = 1:10)
addStyle
will be one of the most used functions in
formatting your Excel document. This function allows you to add any of
the styles in the Styles.R
file to any of the rows or
columns within the Excel document. These styles will be used for
alignment, font size, boldness etc.
Select the required workbook (new_workbook
), required
sheet name (“Table 2”), required style (e.g “tw”) and where you want the
style to be implemented (rows =
and cols =
) In
this example the style tw
refers to a style that implements
text wrapping which would be useful for column headers.
Bear in mind that certain styles are needed depending on what type of number or text you are dealing with (decimal, percentage, text etc).
For example:
ns <- createStyle(numFmt = "#,##0",
halign = "right")
ns
style is a numeric style that inserts a thousands
separator and aligns the text to the right.
ns_percent2 <- createStyle(numFmt = "0.0%",
halign = "right",
textDecoration = "bold")
ns_percent2
is a numeric style that shows a percentage
number to one decimal place and inserts a percentage sign. The number is
aligned to the right and made bold.
bold <- createStyle(textDecoration = "bold",
fontSize = 10)
bold
is a style that can be added to text and sets the
font to size 10 and makes the text bold.
View the Styles.R
file to see all the available styles
or create your own.
writeData
tab2_text <- c("Table 2: Reoffending rate by age and gender [note 1] [note 2]",
"This worksheet contains one table. Some cells refer to notes which can be found on the Notes worksheet.",
"Return to table of contents",
"Some shorthand is used in this table, [x] not available. For more information please see note 1 in the Notes table.",
paste0("Source: Adult and Youth Reoffending in Northern Ireland (",currentyear -1,"/", currentyear - 2000," Cohort)"))
writeData(new_workbook, sheet = "Table 2",
x = tab2_text,
startRow = 1,
startCol = 1,
colNames = FALSE)
writeData
is a function used to write out text or
variables to Excel sheets.
The first two arguments of this function are the required workbook
(new_workbook
) and sheet name (“Table 2”).
x
refers to the data that is to be written into the
document. In this case x
is set to tab2_text
which is a variable that was created directly above.
tab2_text
is a variable that contains all the text that is
to sit above the table on that sheet.
startRow
and startCol
are the co-ordinates
of where we want the variable/text to be placed on the sheet e.g row 1
and column 1.
saveWorkbook
saveWorkbook(new_workbook, output_file, overwrite = TRUE)
saveWorkbook
is the function at the very bottom of the
excel_tables.R
file. This is the function that actually
saves all the work that has been done up to this point in the document
and creates the Excel document which will be saved in the
outputs
folder.
new_workbook
refers to the new workbook that has been
created. output_file
refers to the title of the document
that was created at the top of the excel_tables.R
document.
overwrite = TRUE
this means that every time the Excel
document is created it will overwrite any older version that is in the
outputs folder.
A HTML publication is produced using an RMarkdown script.
Functionally, an RMarkdown script has three types of text input:
The opening section of any RMarkdown document is called the YAML (this stands for Yet Another Mark-up Language). Document settings are declared here.
The YAML is enclosed inside two sets of three ---
s as
shown below:
title: "Northern Ireland Report"
lang: "en"
output:
html_document:
fig_caption: false
toc: true
toc_float: true
toc_depth: 3
css: "style.css"
self_contained: true
params:
pre_release: false
Note that some properties have been tabbed in from the left, this is important and tabs must not be deleted.
The properties we have set help with adhering to accessibility guidelines.
This is the plain text that appears in the html, between figures. This can include section headings, paragraphs of text, lists and images can also be inserted in these sections.
Headings are set in R Markdown by placing a number of #s at the start of a line. The number of #s preceding a line is the level of nesting we want that particular heading to appear at, with level 1 being the highest level.
The overall title of the document is a level 1 heading. It is best practice and an accessibility requirement that the document title be the only level 1 heading in a document. Each chapter should therefore be marked as a level two heading (ie, with two #s).
Any headings generated this way will automatically be added to the Table of Contents on the left hand side of the document and nested accordingly.
Text can be made italic by including a * or a _ around the word (or words) you wish to emphasise. For example:
...% of young people think that relations between Protestants and Catholics are *better now* than they were five years ago.
or
...% of young people think that relations between Protestants and Catholics are _better now_ than they were five years ago.
Will both produce:
…% of young people think that relations between Protestants and Catholics are better now than they were five years ago.
For bold text we need to include two * or two _ around the text. For example, if we write:
The **vision** of the strategy is...
or
The __vision__ of the strategy is...
we will get:
The vision of the strategy is…
There is no method for underlining text in RMarkdown, and although this can be achieved using other HTML methods, accessibility guidance advises that underlining not be used.
Hyperlinks can be included in text by writing the text in
[link text](URL-to-link)
format. For example:
typing
The good relations indicators were developed by [NISRA](http://www.nisra.gov.uk) statisticians...
will output text:
The good relations indicators were developed by NISRA statisticians…
To make email addresses dynamic you can use the following code within the text area:
<a href="mailto:dof.pressoffice@finance-ni.gov.uk">dof.pressoffice@finance-ni.gov.uk</a>.
The email you wish to use should be included after the ‘mailto:’ section while the text you wish to make the link should be included at the end of the code. In the above example the linked text is still the email address, however, this could be changed to other text such as ‘Email Us’.
These footnotes are interactive and click-able. For more information see R Markdown Footnotes
The method for inserting images is the same as hyperlinks, with the
only difference being the inclusion of a ! at the
beginning. Eg, 

Picture of a book
There are two types of list that can be created using R Markdown: unordered and ordered lists.
Unordered lists are produced by placing a “*” (including the space) at the beginning of a line:
* Our Children and Young People
* Our Shared Community
* Our Safe Community
* Our Cultural Expression
will produce:
Ordered lists are produced by placing “1.” at the start of each line. The numbering is automatically applied.
1. Our Children and Young People
1. Our Shared Community
1. Our Safe Community
1. Our Cultural Expression
will give:
An R chunk is opened with the command ```{r}
and is
closed off by typing ```
.
Inside the {r} curly brackets of the command we can specify some settings. A typical R chunk command in our report looks like this:
```{r figure 1, echo = FALSE, out.width = “100%”, warning =
FALSE, message = FALSE}
Where:
Subreports can be called into the main Rmd file using the code below
and a combined report is created.
There may be minor formatting differences that only come through after the main .Rmd is knitted. This is because the style.css file is only called in the main report.
DataCamp course on RMarkdown including a section on messages warnings and errors
For more options, see this page on Markdown syntax or the RMarkdown cheatsheet.
As well as evaluating R code in its own chunks, we can also evaluate
R commands in the Markdown sections by enclosing the command like:
`r some_R_code`
# set the percent of a dataset
value = 23.0
We can then insert this value into commentary by typing:
According to source, `r value`% of adults think
XXXXX.
Which will appear in the final document as:
According to source, 23% of adults think XXXXX.
The code chunk below enables us to set some overall document options, so something can be done for every chunk in the document. You can set the same options on a per-chunk basis, by putting the options in the chunk header, like ```{r, eval=FALSE}
# this chunk sets chunk options for all chunks in this file
knitr::opts_chunk$set(
message= FALSE, echo = FALSE, warning = FALSE, out.width = "100%"
)
Some important options to know about:
- message = FALSE
stops packages from printing all their
messages when they load.
- warning = FALSE
stops packages from displaying warnings
if there is a version conflict.
- error = FALSE
can be used to make a document knit even if
there is a problem in one chunk (that chunk will just run and print its
error).
- echo = TRUE
shows the code, where echo=FALSE
would hide the code.
- eval = TRUE
means to evaluate (run) the code, where
eval=FALSE
will just show the code but not evaluate
it.
- out.width = "100%"
will constrain images and graphs to
the page width.
The code below appears directly below the YAML. It doesn’t produce anything on the page but will allow the page’s activity to be tracked using Google Analytics Tag Manager.
<!-- Google Tag Manager - Google Analytics -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-KF6WGSG');</script>
<!-- End Google Tag Manager -->
<!-- Google Tag Manager (noscript) -->
<noscript><iframe src=https://www.googletagmanager.com/ns.html?id=GTM-KF6WGSG
height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
<!-- End Google Tag Manager (noscript) -->
To create a new version of the report in html after updating data or the code you should click the Knit button in the Rmd file and select “Knit to HTML” option in R Studio.
or pressing Ctrl + Shift + K.
You may get an error similar to the below if there is a problem installing a specific package.
This may be
resolved by manually installing that particular package. In this
example, type
install.packages("rgeos")
in the console.
If that fails there is an additional script included in the zip files called ‘Package check’. Run this and try knitting the original file again.
The following code
rmarkdown::render("T:/Projects/12 - LMR/development/lmr/lmr_master/code/labour-market-report.Rmd")
will knit the labour-market-report and retain all of the dataframes,
variables and functions. This can be useful for debugging [note the file
path will need updated].
Usually when you knit the report all of the dataframes, variables and functions are removed when the html is completed - this is because when the report is knit using the button above R opens a new session, runs the code and closes the session again.
There is a wealth of help guides for R online. Here are some of them:
A number of cheatsheets are available for R.
Dynamic documents with rmarkdown cheatsheet
Data transformation with dplyr cheatsheet
Data import with readr, readxl, and googlesheets4 cheatsheet
Some useful tips for using the RStudio includes changing the appearance in global options.
Date last created/knitted 2024-03-20 09:05:52
Comments
When you want to comment out multiline of R code, the conventional way to do it would be to place a # character at the beginning of each of the lines you need to comment out since R does not support multiline comments.
Performing that task is easy if the number of lines of code to comment out is small. But if you need to comment out a really long block of code, a specialized code editor capable of adding a # character to each line in a selected block could be useful. In RStudio, you can do that by using the Ctrl+Shift+C key combination in Windows. The RStudio documentation offers more information on keyboard short-cuts.