We recommend you join the NISRA MS Teams Channel - R Software Discussion Forum (contact Marti Jefferies to get added).

The code has been developed in r version 4.2.0 or higher.

When closing RStudio it asks, “Save workspace image”. The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions). At the end of an R session, the user can save an image of the current workspace that is automatically reloaded the next time R is started. There is NO need to save the workspace as you will clear the environment before you run any new code. For large R processes it can take time to save/open.

Packages

R packages are a set of predefined functions as a library to be used while deploying the R program. R packages are externally developed and can be imported to the R environment in order to use the available function which belongs to that package.

# Read in libraries for analysis
if(!require(pacman)) install.packages("pacman")
library(pacman)

p_load("plyr", "kableExtra", "plotly", "foreign", "tidyverse", "english", "openxlsx", "furrr", "base64enc")

pacman is a package manager library for R. There are three steps required for reading a package into R:

  1. Check if the package is already installed on your computer. if(!require(pacman)
  2. Install it. install.packages(“pacman”)
  3. Call it into your environment. library(pacman)

Fortunately pacman has a command named p_load() which performs these three steps in one instance.

For example, running

p_load("plyr")

will perform the exact same operation as running

if(!require(plyr)) {install.packages("plyr")}
library(plyr)

p_load() can handle multiple packages at once, which we have used above to read in all our packages needed for analysis.

Package Name What it is used for
base64enc Encodes and embeds image files in HTML
docxtractr Reshaping data
DT A wrapper of the JavaScript Library ‘DataTables’
english Converts numbers to their English language written form eg, 4 = four
foreign Essential for reading and writing SPSS data in R
furrr Mapping functions
here here - here sets the location for the relative paths to be location of current rproj file.
htmltools Tools for HTML generation and output
httr
janitor Creating Excel workbooks
kableExtra kableExtra - package for producing pretty tables
lubridate lubridate - used for create date formats
magrittr Provides a pipe operator for chaining commands together
openxlsx openxlsx - writes xlsx files. Version 4.2.5 or greater needed for this report
plotly Creates interactive web graphics
plyr Contains tool for splitting, applying and combining data
readODS Reading data from ODS sources
readr readr package is used to read in different data formats
readxl readxl - package for reading in excel files
s2 Spherical geometry support
sf Support for simple map features
shiny shiny
stringr stringr - package for string manipulation
tidyr tidyr is a package to help manipulate data into tidey dataframes
tidyverse tidyverse - Tools for splitting, applying and combining data
tmap Produces interactive maps
xfun xfun package is used to create base64 values for images and datasets
zoo zoo - used for rolling totals

A full manual for each is available directly from the Help pane in RStudio.

Working directory

By using an R Project (.Rproj) file we ensure that the current directory is also the working directory.

For example, if I wanted to refer to the file Table1.RDS located in C:/Users/1234567/My Documents/Excel Tables/DataTo2018 and my .Rproj file is located in C:/Users/1234567/My Documents/Excel Tables I need only start my path with Data2018/Table1.RDS.

Please also note that in R folder paths are defined with forward slashes (/) where Windows tends to use back slashes (\)

running R code

There are multiple ways to run R code in a script. To run a single line of code, do one of the following:

  • Place the cursor on the desired line, hold the key, and press enter.
  • Place the cursor on the desired line and click the Run button in the console

To run multiple lines of code, do one of the following:

  • Highlight all the code you’d like to run, hold the key, and press enter. On Mac OS X, hold the key and press return instead.
  • Highlight all the code you’d like to run, and click the Run button.

renv

What is renv?

Renv is a tool that supports reproducibility in R projects. See their website for more details.

Renv is a tool that supports reproducibility in R projects. See their website for more details. Renv supports the reproducibility of projects both over time (so a project from 2 years ago is more likely to work because you can get the exact packages it was built with) and between users (if you have code problem and your colleague doesn’t, you can usually rule out package version differences as the cause).

Why are we moving to renv?

Renv supports the reproducibility of projects both over time (so a project from 2 years ago is more likely to work because you can get the exact packages it was built with) and between users (if you have code problem and your colleague doesn’t, you can usually rule out package version differences as the cause). It also helped us solve problems with repository issues that were presenting randomly within and between users.

Is renv perfect?

No. renv introduces issues itself, but we feel these are more manageable and easier to pinpoint than alternatives. The main issue we have found is when a package updates and a user is forced to build an older version from source - with sf/terra in particular this requires updates to Rtools (a program installed alongside R to help building packages). Newer versions of Rtools seem to work with newer packages, but break building very old versions of packages. We’re still exploring this area.

renv setup

  • Download/clone this project and open the project (rap-skeleton.Rproj)
  • To install and configure renv for this project, run source("renv/activate.R")
  • Disable the MRAN repository using options(renv.config.mran.enabled = FALSE) (it won’t return any results if it works)
  • Tell renv to download and install the packages this projects needs with renv::restore() and then press ‘y’ to accept

renv setup summary

  • Open project
  • source("renv/activate.R")
  • options(renv.config.mran.enabled = FALSE)
  • renv::restore() and press ‘y’ to accept

renv troubleshooting

terra sf -lblosc -lkea -lsz1 related rspatial packages might give errors while building from source. Go to CRAN and find out the latest version[^1] of the troublesome package. Tell renv to use that version using renvv::record(). For example, if the latest version of sf is 1.0-12 you would run renv::record("sf@1.0-12") and then run renv::restore()

mran aws related e.g. failed to retrieve 'https://rstudio-buildtools.s3.amazonaws.com/renv/mran/packages.rds' [error code 22], re-run options(renv.config.mran.enabled = FALSE) (sometimes it re-enables itself)

curl errors / IT security team contact

When running renv, IT may contact you regarding curl being run on your machine. Reference i305810 is related.

Note: If a package has been updated recently, CRAN might not be serving the binary - drop the release version down 1 notch until you can get renv to give a binary of that package. For example, on 19/03/2023, sf was updated to 1.0-12. The renv.lock file asks for 1.0-10 but it is only available from source. 1.0-12 is the latest version, but the r-release (i.e. the current stable version of R (4.2 at the moment)) binary is listed as 1.0-.11. Therefore, we need to ask renv for that version with renv::record("sf@1.0-11"). CRAN might start serving 1.0-12 as the r-release binary in the future.

Hints / tips

Keyboard shortcuts

Some useful KeyBoard shortcuts

Comments

When you want to comment out multiline of R code, the conventional way to do it would be to place a # character at the beginning of each of the lines you need to comment out since R does not support multiline comments.

Performing that task is easy if the number of lines of code to comment out is small. But if you need to comment out a really long block of code, a specialized code editor capable of adding a # character to each line in a selected block could be useful. In RStudio, you can do that by using the Ctrl+Shift+C key combination in Windows. The RStudio documentation offers more information on keyboard short-cuts.

Pipe %>% shortcut

Insert the pipe operator %>% with Ctrl + Shift + M

select()

The select function in R is used to select variables (columns) in R using Dplyr package. It can be used to create a subset of variables or to drop variables e.g. select(-variablename) will remove the variable name from the dataframe/query.

An explanation of the select() function

remove dataframes, vectors and text from the environment

The rm() function can be used to remove items from the environment.

here:here

here::here() figures out the top-level of your current project.

pull()

The pull() function is used in the datset universals quite frequently. It extracts a value out of a data frame to a text or numeric value.

How to merge data in R

  • Rbind - which stacks datasets (or UNION) them together
  • merge - which allows merging two data frames by common columns or by row names-
  • cbind - which binds columns

t - transform function

The t function in R allows a matrix (or dataframe) to be transposed.

Excel table functions

The process of creating the Excel document involves writing out the information to the correct sheet and then formatting that information. The Styles.R file within the Functions folder of the code shows what different text styles are available to add to our tables or text. If more styles are needed in the future then these would be created in the Styles.R file. The process for writing and formatting is similar for both text and tables.

Here are some of the main commands needed for this process:

writeDataTable

writeDataTable(new_workbook, sheet = "Table 2",
               x = tab2_df,
               startRow = 6,
               startCol = 1,
               colNames = TRUE,
               tableStyle = "none",
               tableName = "Table_2",
               withFilter = FALSE,
               bandedRows = FALSE,
               keepNA = TRUE,
               na.string = suppressed,
               headerStyle = h3)

writeDataTable is the function used to write out the dataframes we created in the data_prep file.

The first two arguments of this function are the name of the workbook (new_workbook) and the name of the sheet (“Table 2”) that we want to write on.

x refers to the dataframe from the data_prep file that we want to insert e.g tab2_df.

startRow and startCol are the co-ordinates of where we want the dataframe to be place e.g row 6 and column 1.

keepNA is required for tables that have NA values in them that were created for suppression. If this wasn’t selected then any NA values would just come out as blank cells.

na.string = suppressed causes any NA values in the dataframe to be equal to the variable suppressed that was created earlier. This means that an x will be inserted in any NA values that were created fo suppression.

headerStyle = h3 sets the header style to h3 which is available to view in the Styles.R file.

setColWidths

setColWidths(new_workbook, sheet = "Table 2", cols = 1, widths = 16)
setColWidths(new_workbook, sheet = "Table 2", cols = c(2:10),  widths = 18)

setColWidths is a function that sets the width of any Excel column you select. This has to be set for all the columns in the tables in order for column headers to fit correctly etc.

Select the required workbook (new_workbook), required sheet name (“Table 2”), required column (either a single number or a range) and the width (numeric value).

addStyle

addStyle(new_workbook, sheet = "Table 2",
         style = tw, rows = 6, cols = 1:10)

addStyle will be one of the most used functions in formatting your Excel document. This function allows you to add any of the styles in the Styles.R file to any of the rows or columns within the Excel document. These styles will be used for alignment, font size, boldness etc.

Select the required workbook (new_workbook), required sheet name (“Table 2”), required style (e.g “tw”) and where you want the style to be implemented (rows = and cols =) In this example the style tw refers to a style that implements text wrapping which would be useful for column headers.

Bear in mind that certain styles are needed depending on what type of number or text you are dealing with (decimal, percentage, text etc).

For example:

ns <- createStyle(numFmt = "#,##0",
                  halign = "right")

ns style is a numeric style that inserts a thousands separator and aligns the text to the right.

ns_percent2 <- createStyle(numFmt = "0.0%",
                          halign = "right",
                          textDecoration = "bold")

ns_percent2 is a numeric style that shows a percentage number to one decimal place and inserts a percentage sign. The number is aligned to the right and made bold.

bold <- createStyle(textDecoration = "bold",
                  fontSize = 10)

bold is a style that can be added to text and sets the font to size 10 and makes the text bold.

View the Styles.R file to see all the available styles or create your own.

writeData

tab2_text <- c("Table 2: Reoffending rate by age and gender [note 1] [note 2]",     
               "This worksheet contains one table. Some cells refer to notes which can be found on the Notes worksheet.",       
               "Return to table of contents",       
               "Some shorthand is used in this table, [x] not available. For more information please see note 1 in the Notes table.",       
               paste0("Source: Adult and Youth Reoffending in Northern Ireland (",currentyear -1,"/", currentyear - 2000," Cohort)"))

writeData(new_workbook, sheet = "Table 2",
          x = tab2_text,
          startRow = 1,
          startCol = 1,
          colNames = FALSE)

writeData is a function used to write out text or variables to Excel sheets.

The first two arguments of this function are the required workbook (new_workbook) and sheet name (“Table 2”).

x refers to the data that is to be written into the document. In this case x is set to tab2_text which is a variable that was created directly above. tab2_text is a variable that contains all the text that is to sit above the table on that sheet.

startRow and startCol are the co-ordinates of where we want the variable/text to be placed on the sheet e.g row 1 and column 1.

saveWorkbook

saveWorkbook(new_workbook, output_file, overwrite = TRUE)

saveWorkbook is the function at the very bottom of the excel_tables.R file. This is the function that actually saves all the work that has been done up to this point in the document and creates the Excel document which will be saved in the outputs folder.

new_workbook refers to the new workbook that has been created. output_file refers to the title of the document that was created at the top of the excel_tables.R document. overwrite = TRUE this means that every time the Excel document is created it will overwrite any older version that is in the outputs folder.

RMarkdown

A HTML publication is produced using an RMarkdown script.

Functionally, an RMarkdown script has three types of text input:

  1. YAML - This is the code that sets the document properties
  2. Markdown - This is the plain text that appears in the html, between figures. This can include section headings, paragraphs of text, lists and images can also be inserted in these sections.
  3. R Chunk - This is a block of R code that will be evaluated within the document.

YAML

The opening section of any RMarkdown document is called the YAML (this stands for Yet Another Mark-up Language). Document settings are declared here.

The YAML is enclosed inside two sets of three ---s as shown below:

title: "Northern Ireland Report"
lang: "en"
output:
  html_document:
    fig_caption: false
    toc: true
    toc_float: true
    toc_depth: 3
    css: "style.css"
    self_contained: true
params:
  pre_release: false

Note that some properties have been tabbed in from the left, this is important and tabs must not be deleted.

The properties we have set help with adhering to accessibility guidelines.

  • title: The title to give the document. This will appear in both the metadata and in page banner at the top of the page
  • lang: For metadata. Will also tell screenreaders what language to read the document in.
  • html_document: We wish to produce a html document (instead of a PDF)
  • fig_caption: false: We are telling R Markdown to not display the alt text on images directly below the images.
  • toc:true: We want an auto-generated table of contents based on the headings inside the document.
  • toc_float: true: We want the table of contents to stay on the left hand side of the page as we scroll down the document.
  • toc_depth: 3: The amount of heading levels to display in the table of contents.
  • css: “style.css”: A link to a style sheet containing various different text formats etc
  • self_contained: true: Set to true in order to embed files and images in the html document itself. This means publishing the html only requires a single .html file needs to be uploaded to DataViz server.
  • params: Extra custom parameters that can be set on the report.
  • pre_release: When set to TRUE/true the pre-release version of the report is knit. When set to FALSE/false the publication version is knit.

Markdown

This is the plain text that appears in the html, between figures. This can include section headings, paragraphs of text, lists and images can also be inserted in these sections.

Headings

Headings are set in R Markdown by placing a number of #s at the start of a line. The number of #s preceding a line is the level of nesting we want that particular heading to appear at, with level 1 being the highest level.

The overall title of the document is a level 1 heading. It is best practice and an accessibility requirement that the document title be the only level 1 heading in a document. Each chapter should therefore be marked as a level two heading (ie, with two #s).

Any headings generated this way will automatically be added to the Table of Contents on the left hand side of the document and nested accordingly.

Italic and bold text

Text can be made italic by including a * or a _ around the word (or words) you wish to emphasise. For example:

...% of young people think that relations between Protestants and Catholics are *better now* than they were five years ago.

or

...% of young people think that relations between Protestants and Catholics are _better now_ than they were five years ago.

Will both produce:

…% of young people think that relations between Protestants and Catholics are better now than they were five years ago.

For bold text we need to include two * or two _ around the text. For example, if we write:

The **vision** of the strategy is...

or

The __vision__ of the strategy is...

we will get:

The vision of the strategy is…

There is no method for underlining text in RMarkdown, and although this can be achieved using other HTML methods, accessibility guidance advises that underlining not be used.

Adding Footnotes

These footnotes are interactive and click-able. For more information see R Markdown Footnotes

Inserting images

The method for inserting images is the same as hyperlinks, with the only difference being the inclusion of a ! at the beginning. Eg, ![alt text for image](url-to-image)

![Picture of a book](images/book.png)

Picture of a book

Lists

There are two types of list that can be created using R Markdown: unordered and ordered lists.

Unordered lists are produced by placing a “*” (including the space) at the beginning of a line:

* Our Children and Young People
* Our Shared Community
* Our Safe Community
* Our Cultural Expression

will produce:

  • Our Children and Young People
  • Our Shared Community
  • Our Safe Community
  • Our Cultural Expression

Ordered lists are produced by placing “1.” at the start of each line. The numbering is automatically applied.

1. Our Children and Young People
1. Our Shared Community
1. Our Safe Community
1. Our Cultural Expression

will give:

  1. Our Children and Young People
  2. Our Shared Community
  3. Our Safe Community
  4. Our Cultural Expression

R Chunk

An R chunk is opened with the command ```{r} and is closed off by typing ```.

Inside the {r} curly brackets of the command we can specify some settings. A typical R chunk command in our report looks like this:

```{r figure 1, echo = FALSE, out.width = “100%”, warning = FALSE, message = FALSE}

Where:

  • figure1: Any text that appears after r but before the first comma (,) is a name to give the chunk. This is only used for navigation within R Studio. It is not necessary to declare a name for a chunk, however if it is declared it must be unique.
  • echo = FALSE: By default the code inside the R chunk itself will be displayed in the final document. Setting it to FALSE here will turn that off. We only wish to see the output of the code.
  • out.width = “100%”: This will ensure that the graphics being output fill the whole width of the document.
  • warning = FALSE and message = FALSE: Like with echo = FALSE messages and warnings that would normally be printed to the console also get output in the document. Setting these to FALSE here means the messages/warnings will only be reported back in R and not in the final document.

subreports

Subreports can be called into the main Rmd file using the code below and a combined report is created.

There may be minor formatting differences that only come through after the main .Rmd is knitted. This is because the style.css file is only called in the main report.

messages, warnings & errors

  • A message in R is a diagnostic message for information and does not stop the code from completing. Messages will appear in the Console when code is being completed.
  • A warning is an alert that something may not be as expected in code or data but does not stop the code from completing. Warning messages are generally switched off in the 00_universal.R
  • An error is an indication that something has not completed - an explanation will be written to the Console. If you have highlighted a more than one set of code R will attempt to continue on through the rest of the code.

DataCamp course on RMarkdown including a section on messages warnings and errors

Inline R

As well as evaluating R code in its own chunks, we can also evaluate R commands in the Markdown sections by enclosing the command like: `r some_R_code`

# set the percent of a dataset
value = 23.0

We can then insert this value into commentary by typing:

According to source, `r value`% of adults think XXXXX.

Which will appear in the final document as:

According to source, 23% of adults think XXXXX.

RMarkdown Options

The code chunk below enables us to set some overall document options, so something can be done for every chunk in the document. You can set the same options on a per-chunk basis, by putting the options in the chunk header, like ```{r, eval=FALSE}

# this chunk sets chunk options for all chunks in this file
knitr::opts_chunk$set(
  message= FALSE, echo = FALSE, warning = FALSE, out.width = "100%"
)

Some important options to know about:
- message = FALSE stops packages from printing all their messages when they load.
- warning = FALSE stops packages from displaying warnings if there is a version conflict.
- error = FALSE can be used to make a document knit even if there is a problem in one chunk (that chunk will just run and print its error).
- echo = TRUE shows the code, where echo=FALSE would hide the code.
- eval = TRUE means to evaluate (run) the code, where eval=FALSE will just show the code but not evaluate it.
- out.width = "100%" will constrain images and graphs to the page width.

Google Analytics

The code below appears directly below the YAML. It doesn’t produce anything on the page but will allow the page’s activity to be tracked using Google Analytics Tag Manager.

<!-- Google Tag Manager - Google Analytics -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-KF6WGSG');</script>
<!-- End Google Tag Manager -->

<!-- Google Tag Manager (noscript) -->
<noscript><iframe src=https://www.googletagmanager.com/ns.html?id=GTM-KF6WGSG
height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
<!-- End Google Tag Manager (noscript) -->

Knitting a report

To create a new version of the report in html after updating data or the code you should click the Knit button in the Rmd file and select “Knit to HTML” option in R Studio.

or pressing Ctrl + Shift + K.

Debugging the report

You may get an error similar to the below if there is a problem installing a specific package.

This may be resolved by manually installing that particular package. In this example, type install.packages("rgeos") in the console.

If that fails there is an additional script included in the zip files called ‘Package check’. Run this and try knitting the original file again.

The following code rmarkdown::render("T:/Projects/12 - LMR/development/lmr/lmr_master/code/labour-market-report.Rmd") will knit the labour-market-report and retain all of the dataframes, variables and functions. This can be useful for debugging [note the file path will need updated].

Usually when you knit the report all of the dataframes, variables and functions are removed when the html is completed - this is because when the report is knit using the button above R opens a new session, runs the code and closes the session again.

Further help

Resources

Date

Date last created/knitted 2024-03-20 09:05:52