We recommend you join the NISRA MS Teams Channel - R Software Discussion Forum (contact Marti Jefferies to get added).

The code has been developed in r version 4.2.0 or higher.

When closing RStudio it asks, “Save workspace image”. The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions). At the end of an R session, the user can save an image of the current workspace that is automatically reloaded the next time R is started. There is NO need to save the workspace as you will clear the environment before you run any new code. For large R processes it can take time to save/open.

Packages

R packages are a set of predefined functions as a library to be used while deploying the R program. R packages are externally developed and can be imported to the R environment in order to use the available function which belongs to that package.

pacman is a package manager library for R. There are three steps required for reading a package into R:

  1. Check if the package is already installed on your computer. if(!require(pacman)
  2. Install it. install.packages(“pacman”)
  3. Call it into your environment. library(pacman)

Fortunately pacman has a command named p_load() which performs these three steps in one instance.

For example, running

p_load("plyr")

will perform the exact same operation as running

if(!require(plyr)) {install.packages("plyr")}
library(plyr)

p_load() can handle multiple packages at once, which we have used above to read in all our packages needed for analysis.

Package Name What it is used for
base64enc Encodes and embeds image files in HTML
docxtractr Reshaping data
DT A wrapper of the JavaScript Library ‘DataTables’
english Converts numbers to their English language written form eg, 4 = four
foreign Essential for reading and writing SPSS data in R
furrr Mapping functions
here here - here sets the location for the relative paths to be location of current rproj file.
htmltools Tools for HTML generation and output
httr
janitor Creating Excel workbooks
kableExtra kableExtra - package for producing pretty tables
lubridate lubridate - used for create date formats
magrittr Provides a pipe operator for chaining commands together
openxlsx openxlsx - writes xlsx files. Version 4.2.5 or greater needed for this report
plotly Creates interactive web graphics
plyr Contains tool for splitting, applying and combining data
readODS Reading data from ODS sources
readr readr package is used to read in different data formats
readxl readxl - package for reading in excel files
s2 Spherical geometry support
sf Support for simple map features
shiny shiny
stringr stringr - package for string manipulation
tidyr tidyr is a package to help manipulate data into tidey dataframes
tidyverse tidyverse - Tools for splitting, applying and combining data
tmap Produces interactive maps
xfun xfun package is used to create base64 values for images and datasets
zoo zoo - used for rolling totals

A full manual for each is available directly from the Help pane in RStudio.

Working directory

By using an R Project (.Rproj) file we ensure that the current directory is also the working directory.

For example, if I wanted to refer to the file Table1.RDS located in C:/Users/1234567/My Documents/Excel Tables/DataTo2018 and my .Rproj file is located in C:/Users/1234567/My Documents/Excel Tables I need only start my path with Data2018/Table1.RDS.

Please also note that in R folder paths are defined with forward slashes (/) where Windows tends to use back slashes (\)

running R code

There are multiple ways to run R code in a script. To run a single line of code, do one of the following:

  • Place the cursor on the desired line, hold the CTRL key, and press enter.
  • Place the cursor on the desired line and click the Run button in the console

To run multiple lines of code, do one of the following:

  • Highlight all the code you’d like to run, hold the CTRL key, and press enter. On Mac OS X, hold the COMMAND key and press return instead.
  • Highlight all the code you’d like to run, and click the Run button.

renv

What is renv?

Renv is a tool that supports reproducibility in R projects. With renv projects, the exact packages and versions of those packages are recorded in a lockfile which helps ensure longevity (as packages are continually being developed and sometimes deprecated) and consistency between users (so that everyone will be using the same versions of the same packages to run the code). To find out more about renv, visit their website.

Is renv perfect?

No. Building packages from source in particular and working within the constraints of organisational networks mean that issues can occur with access to packages when using renv. Most commonly these issues relate to package build and installation. However the problems introduced by renv are small compared with the potential for failures and version confusion possible when working across and within branches who are all using different R and package versions.

How are we minimising risk with renv?

Tech Lab have set up an internal package repository that we are calling TLCRAN. Basically an internal package repository hosted on IT Assist servers where users can access already compiled binary (.zip) files of various versions of R packages. Currently the RAP Skeleton offered by Tech Lab is the only product that we have officially setup to use this TLCRAN. When you open the RAP Skeleton code you should be connected to TLCRAN in the background of your project silently and automatically.

renv setup in the RAP Skeleton

Upon opening the rap-skeleton.Rproj file for the first time you should see a message in the console similar to:

# Bootstrapping renv 1.0.7 ---------------------------------------------------
- Downloading renv ... OK
- Installing renv  ... OK

- Project 'C:/.../34-rap-skeleton/rap-skeleton-dev' loaded. [renv 1.0.7]
- One or more packages recorded in the lockfile are not installed.
- Use `renv::status()` for more details.

Next open the renv_setup.R script and follow the steps within titled renv::restore() and renv::status(). If successful, renv should now be activated and all required packages should be available. For further information visit the Renv website.

renv troubleshooting

Connection errors

If there are error messages relating to certain packages not being able to be found, or packages installing from source when running renv::restore(), it is possible that the automatic connection to TLCRAN has not worked. To test this, run options("repos") in the R Console. You should see a message similar to:

$repos
                                                             TLCRAN1 
           "file:////pr-clus-vfpdfp/DOF_NISRA_R_Packages/production" 
                                                             TLCRAN2 
"file:////pr-clus-vfpdfp/DOF_NISRA_R_Packages/production-2024-08-21" 
                                                             TLCRAN3 
"file:////pr-clus-vfpdfp/DOF_NISRA_R_Packages/production-2024-08-22" 
                                                             TLCRAN4 
"file:////pr-clus-vfpdfp/DOF_NISRA_R_Packages/production-2024-09-06" 
                                                             TLCRAN5 
"file:////pr-clus-vfpdfp/DOF_NISRA_R_Packages/production-2024-09-16" 
                                                                CRAN 
                                         "https://cran.rstudio.com/"

If not, go back to the latest Release of the RAP Skeleton and downoad the .zip file. Ensure the code you have in the .Rprofile file in your project folder matches the code in the .Rprofile of the download.

If you still are suffering connection issues, you may not have permission to access TLCRAN. Check that you can map \\pr-clus-vfpdfp\DOF_NISRA_R_Packages on your machine. If not, you will need to be added to the AD group called G_DOF NISRA Statisticians. IT Assist can help with this.

Install errors

It is still possible to install, even without access to TLCRAN. Below are a few issues that can occur when setting up renv from an online repository or CRAN.

  • When doing a renv::restore() you mays see some fails when trying to install a particular package. Usually this happens when renv tries to build a package from source as this is more complex than a standard binary (.zip) install. If a particular package fails, one workaround is to find the most recent version of that package, which is usually in binary form and try to install it instead. Go to CRAN and find the page for the package you are interested in. The most recent version will be detailed there. We can instruct renv to use that version of the package instead by running renv::record(). For example if the latest version of the sf package was 1.0-12, run renv::record("sf@1.0-12"). Then try a renv::restore(). If more than one package fails to install you will have to carry out this same process for each occurrence.

  • If you find there are multiple packages failing and no way to tell how many renv::record() calls you may have to make, a second option is to run renv::update(). This will, as you might expect, update all the packages in your lockfile to their most recent versions on CRAN. This should, mainly set the packages required to be binary and not source, therefore increasing your chances of successful install.

    • You may be asked to restart your R session following renv::update()
    • Following a restart, run renv::snapshot() to reflect the package changes you have just made back to your renv.lock file
    • Select option 2 when prompted by R
    • renv::status() should now return a message telling you that your project is in a consistent state

Be aware that the update() option above is only advisable if you are updating from packages that are fairly recent - if the packages in your lock file are years old, you may cause breaking changes to your code as there can be substantial changes to packages over time. These breaks, if they occur, can be managed by finding where the issues or deprecated functions are in your code and updating as required by the newer version. Most packages have documentation with the functions available in that version for reference

  • As a final workaround, if the above two options are unsucessful, if you can get hold of a version of the package you need that someone else has been able to install, you can place it in your renv global cache under the correct version number. More info on renv global cache here.

curl errors

Renv uses curl to get and install packages. It is an .exe application and use of these can be flagged up to IT Assist by BT. If you are contacted about this, explain that you are using it to install R packages from an approved package repository and if necessary quote reference i305810 as this issue has been dealt with by Tech Lab before.

Hints / tips

Keyboard shortcuts

Some useful KeyBoard shortcuts

Comments

When you want to comment out multiline of R code, the conventional way to do it would be to place a # character at the beginning of each of the lines you need to comment out since R does not support multiline comments.

Performing that task is easy if the number of lines of code to comment out is small. But if you need to comment out a really long block of code, a specialized code editor capable of adding a # character to each line in a selected block could be useful. In RStudio, you can do that by using the Ctrl+Shift+C key combination in Windows. The RStudio documentation offers more information on keyboard short-cuts.

Pipe %>% shortcut

Insert the pipe operator %>% with Ctrl + Shift + M

select()

The select function in R is used to select variables (columns) in R using Dplyr package. It can be used to create a subset of variables or to drop variables e.g. select(-variablename) will remove the variable name from the dataframe/query.

An explanation of the select() function

remove dataframes, vectors and text from the environment

The rm() function can be used to remove items from the environment.

here:here

here::here() figures out the top-level of your current project.

pull()

The pull() function is used in the datset universals quite frequently. It extracts a value out of a data frame to a text or numeric value.

How to merge data in R

  • Rbind - which stacks datasets (or UNION) them together
  • merge - which allows merging two data frames by common columns or by row names-
  • cbind - which binds columns

t - transform function

The t function in R allows a matrix (or dataframe) to be transposed.

Excel table functions

The process of creating the Excel document involves writing out the information to the correct sheet and then formatting that information. The Styles.R file within the Functions folder of the code shows what different text styles are available to add to our tables or text. If more styles are needed in the future then these would be created in the Styles.R file. The process for writing and formatting is similar for both text and tables.

Here are some of the main commands needed for this process:

writeDataTable

writeDataTable(new_workbook, sheet = "Table 2",
               x = tab2_df,
               startRow = 6,
               startCol = 1,
               colNames = TRUE,
               tableStyle = "none",
               tableName = "Table_2",
               withFilter = FALSE,
               bandedRows = FALSE,
               keepNA = TRUE,
               na.string = suppressed,
               headerStyle = h3)

writeDataTable is the function used to write out the dataframes we created in the data_prep file.

The first two arguments of this function are the name of the workbook (new_workbook) and the name of the sheet (“Table 2”) that we want to write on.

x refers to the dataframe from the data_prep file that we want to insert e.g tab2_df.

startRow and startCol are the co-ordinates of where we want the dataframe to be place e.g row 6 and column 1.

keepNA is required for tables that have NA values in them that were created for suppression. If this wasn’t selected then any NA values would just come out as blank cells.

na.string = suppressed causes any NA values in the dataframe to be equal to the variable suppressed that was created earlier. This means that an x will be inserted in any NA values that were created fo suppression.

headerStyle = h3 sets the header style to h3 which is available to view in the Styles.R file.

setColWidths

setColWidths(new_workbook, sheet = "Table 2", cols = 1, widths = 16)
setColWidths(new_workbook, sheet = "Table 2", cols = c(2:10),  widths = 18)

setColWidths is a function that sets the width of any Excel column you select. This has to be set for all the columns in the tables in order for column headers to fit correctly etc.

Select the required workbook (new_workbook), required sheet name (“Table 2”), required column (either a single number or a range) and the width (numeric value).

addStyle

addStyle(new_workbook, sheet = "Table 2",
         style = tw, rows = 6, cols = 1:10)

addStyle will be one of the most used functions in formatting your Excel document. This function allows you to add any of the styles in the Styles.R file to any of the rows or columns within the Excel document. These styles will be used for alignment, font size, boldness etc.

Select the required workbook (new_workbook), required sheet name (“Table 2”), required style (e.g “tw”) and where you want the style to be implemented (rows = and cols =) In this example the style tw refers to a style that implements text wrapping which would be useful for column headers.

Bear in mind that certain styles are needed depending on what type of number or text you are dealing with (decimal, percentage, text etc).

For example:

ns <- createStyle(numFmt = "#,##0",
                  halign = "right")

ns style is a numeric style that inserts a thousands separator and aligns the text to the right.

ns_percent2 <- createStyle(numFmt = "0.0%",
                          halign = "right",
                          textDecoration = "bold")

ns_percent2 is a numeric style that shows a percentage number to one decimal place and inserts a percentage sign. The number is aligned to the right and made bold.

bold <- createStyle(textDecoration = "bold",
                  fontSize = 10)

bold is a style that can be added to text and sets the font to size 10 and makes the text bold.

View the Styles.R file to see all the available styles or create your own.

writeData

tab2_text <- c("Table 2: Reoffending rate by age and gender [note 1] [note 2]",     
               "This worksheet contains one table. Some cells refer to notes which can be found on the Notes worksheet.",       
               "Return to table of contents",       
               "Some shorthand is used in this table, [x] not available. For more information please see note 1 in the Notes table.",       
               paste0("Source: Adult and Youth Reoffending in Northern Ireland (",currentyear -1,"/", currentyear - 2000," Cohort)"))

writeData(new_workbook, sheet = "Table 2",
          x = tab2_text,
          startRow = 1,
          startCol = 1,
          colNames = FALSE)

writeData is a function used to write out text or variables to Excel sheets.

The first two arguments of this function are the required workbook (new_workbook) and sheet name (“Table 2”).

x refers to the data that is to be written into the document. In this case x is set to tab2_text which is a variable that was created directly above. tab2_text is a variable that contains all the text that is to sit above the table on that sheet.

startRow and startCol are the co-ordinates of where we want the variable/text to be placed on the sheet e.g row 1 and column 1.

saveWorkbook

saveWorkbook(new_workbook, output_file, overwrite = TRUE)

saveWorkbook is the function at the very bottom of the excel_tables.R file. This is the function that actually saves all the work that has been done up to this point in the document and creates the Excel document which will be saved in the outputs folder.

new_workbook refers to the new workbook that has been created. output_file refers to the title of the document that was created at the top of the excel_tables.R document. overwrite = TRUE this means that every time the Excel document is created it will overwrite any older version that is in the outputs folder.

RMarkdown

A HTML publication is produced using an RMarkdown script.

Functionally, an RMarkdown script has three types of text input:

  1. YAML - This is the code that sets the document properties
  2. Markdown - This is the plain text that appears in the html, between figures. This can include section headings, paragraphs of text, lists and images can also be inserted in these sections.
  3. R Chunk - This is a block of R code that will be evaluated within the document.

YAML

The opening section of any RMarkdown document is called the YAML (this stands for Yet Another Mark-up Language). Document settings are declared here.

The YAML is enclosed inside two sets of three ---s as shown below:

title: "Northern Ireland Report"
lang: "en"
output:
  html_document:
    fig_caption: false
    toc: true
    toc_float: true
    toc_depth: 3
    css: "style.css"
    self_contained: true
params:
  pre_release: false

Note that some properties have been tabbed in from the left, this is important and tabs must not be deleted.

The properties we have set help with adhering to accessibility guidelines.

  • title: The title to give the document. This will appear in both the metadata and in page banner at the top of the page
  • lang: For metadata. Will also tell screenreaders what language to read the document in.
  • html_document: We wish to produce a html document (instead of a PDF)
  • fig_caption: false: We are telling R Markdown to not display the alt text on images directly below the images.
  • toc:true: We want an auto-generated table of contents based on the headings inside the document.
  • toc_float: true: We want the table of contents to stay on the left hand side of the page as we scroll down the document.
  • toc_depth: 3: The amount of heading levels to display in the table of contents.
  • css: “style.css”: A link to a style sheet containing various different text formats etc
  • self_contained: true: Set to true in order to embed files and images in the html document itself. This means publishing the html only requires a single .html file needs to be uploaded to DataViz server.
  • params: Extra custom parameters that can be set on the report.
  • pre_release: When set to TRUE/true the pre-release version of the report is knit. When set to FALSE/false the publication version is knit.

Markdown

This is the plain text that appears in the html, between figures. This can include section headings, paragraphs of text, lists and images can also be inserted in these sections.

Headings

Headings are set in R Markdown by placing a number of #s at the start of a line. The number of #s preceding a line is the level of nesting we want that particular heading to appear at, with level 1 being the highest level.

The overall title of the document is a level 1 heading. It is best practice and an accessibility requirement that the document title be the only level 1 heading in a document. Each chapter should therefore be marked as a level two heading (ie, with two #s).

Any headings generated this way will automatically be added to the Table of Contents on the left hand side of the document and nested accordingly.

Italic and bold text

Text can be made italic by including a * or a _ around the word (or words) you wish to emphasise. For example:

...% of young people think that relations between Protestants and Catholics are *better now* than they were five years ago.

or

...% of young people think that relations between Protestants and Catholics are _better now_ than they were five years ago.

Will both produce:

…% of young people think that relations between Protestants and Catholics are better now than they were five years ago.

For bold text we need to include two * or two _ around the text. For example, if we write:

The **vision** of the strategy is...

or

The __vision__ of the strategy is...

we will get:

The vision of the strategy is…

There is no method for underlining text in RMarkdown, and although this can be achieved using other HTML methods, accessibility guidance advises that underlining not be used.

Adding Footnotes

These footnotes are interactive and click-able. For more information see R Markdown Footnotes

Inserting images

The method for inserting images is the same as hyperlinks, with the only difference being the inclusion of a ! at the beginning. Eg, ![alt text for image](url-to-image)

![Picture of a book](images/book.png)

Picture of a book
Picture of a book

Lists

There are two types of list that can be created using R Markdown: unordered and ordered lists.

Unordered lists are produced by placing a “*” (including the space) at the beginning of a line:

* Our Children and Young People
* Our Shared Community
* Our Safe Community
* Our Cultural Expression

will produce:

  • Our Children and Young People
  • Our Shared Community
  • Our Safe Community
  • Our Cultural Expression

Ordered lists are produced by placing “1.” at the start of each line. The numbering is automatically applied.

1. Our Children and Young People
1. Our Shared Community
1. Our Safe Community
1. Our Cultural Expression

will give:

  1. Our Children and Young People
  2. Our Shared Community
  3. Our Safe Community
  4. Our Cultural Expression

R Chunk

An R chunk is opened with the command ```{r} and is closed off by typing ```.

Inside the {r} curly brackets of the command we can specify some settings. A typical R chunk command in our report looks like this:

```{r figure 1, echo = FALSE, out.width = “100%”, warning = FALSE, message = FALSE}

Where:

  • figure1: Any text that appears after r but before the first comma (,) is a name to give the chunk. This is only used for navigation within R Studio. It is not necessary to declare a name for a chunk, however if it is declared it must be unique.
  • echo = FALSE: By default the code inside the R chunk itself will be displayed in the final document. Setting it to FALSE here will turn that off. We only wish to see the output of the code.
  • out.width = “100%”: This will ensure that the graphics being output fill the whole width of the document.
  • warning = FALSE and message = FALSE: Like with echo = FALSE messages and warnings that would normally be printed to the console also get output in the document. Setting these to FALSE here means the messages/warnings will only be reported back in R and not in the final document.

subreports

Subreports can be called into the main Rmd file using the code below and a combined report is created.

There may be minor formatting differences that only come through after the main .Rmd is knitted. This is because the style.css file is only called in the main report.

messages, warnings & errors

  • A message in R is a diagnostic message for information and does not stop the code from completing. Messages will appear in the Console when code is being completed.
  • A warning is an alert that something may not be as expected in code or data but does not stop the code from completing. Warning messages are generally switched off in the 00_universal.R
  • An error is an indication that something has not completed - an explanation will be written to the Console. If you have highlighted a more than one set of code R will attempt to continue on through the rest of the code.

DataCamp course on RMarkdown including a section on messages warnings and errors

Inline R

As well as evaluating R code in its own chunks, we can also evaluate R commands in the Markdown sections by enclosing the command like: `r some_R_code`

# set the percent of a dataset
value = 23.0

We can then insert this value into commentary by typing:

According to source, `r value`% of adults think XXXXX.

Which will appear in the final document as:

According to source, 23% of adults think XXXXX.

RMarkdown Options

The code chunk below enables us to set some overall document options, so something can be done for every chunk in the document. You can set the same options on a per-chunk basis, by putting the options in the chunk header, like ```{r, eval=FALSE}

# this chunk sets chunk options for all chunks in this file
knitr::opts_chunk$set(
  message= FALSE, echo = FALSE, warning = FALSE, out.width = "100%"
)

Some important options to know about:
- message = FALSE stops packages from printing all their messages when they load.
- warning = FALSE stops packages from displaying warnings if there is a version conflict.
- error = FALSE can be used to make a document knit even if there is a problem in one chunk (that chunk will just run and print its error).
- echo = TRUE shows the code, where echo=FALSE would hide the code.
- eval = TRUE means to evaluate (run) the code, where eval=FALSE will just show the code but not evaluate it.
- out.width = "100%" will constrain images and graphs to the page width.

Google Analytics

The code below appears directly below the YAML. It doesn’t produce anything on the page but will allow the page’s activity to be tracked using Google Analytics Tag Manager.

<!-- Google Tag Manager - Google Analytics -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-KF6WGSG');</script>
<!-- End Google Tag Manager -->

<!-- Google Tag Manager (noscript) -->
<noscript><iframe src=https://www.googletagmanager.com/ns.html?id=GTM-KF6WGSG
height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
<!-- End Google Tag Manager (noscript) -->

Knitting a report

To create a new version of the report in html after updating data or the code you should click the Knit button in the Rmd file and select “Knit to HTML” option in R Studio.

or pressing Ctrl + Shift + K.

Debugging the report

You may get an error similar to the below if there is a problem installing a specific package.

This may be resolved by manually installing that particular package. In this example, type install.packages("rgeos") in the console.

If that fails there is an additional script included in the zip files called ‘Package check’. Run this and try knitting the original file again.

The following code rmarkdown::render("T:/Projects/12 - LMR/development/lmr/lmr_master/code/labour-market-report.Rmd") will knit the labour-market-report and retain all of the dataframes, variables and functions. This can be useful for debugging [note the file path will need updated].

Usually when you knit the report all of the dataframes, variables and functions are removed when the html is completed - this is because when the report is knit using the button above R opens a new session, runs the code and closes the session again.

Further help

Resources

Date

Date last created/knitted 2024-09-24 15:55:13.223514