What is RStudio Server?

The RStudio server on DASH is an version of RStudio the you use in a web browser that runs on a Databricks cluster.

Who is this guide for?

This guide is aimed at users who are familiar with R and RStudio, but new to the DASH Platform.

What is this guide for?

This user guide shows how to access RStudio from within the DASH Platform PROD environment, how to close RStudio, and work with data from within the DASH Platform.

Accessing RStudio on the DASH Platform

  • Within the workspace, click on Compute on the menu on the left.
  • In the table that appears, click on RStudio.
  • In the horizontal menu, click on Apps.
  • Click on Setup RStudio. This will generate a one-time password which you need to login to RStudio.
  • There will also be an email address above the password which you need to use for logging in. It may be the same as your Defra or ALB email address or it may be different.
  • To copy the password, click on show which will reveal the password and copy it.
  • Click on Open RStudio.

RStudio login screen with username and password field

Use your email address as the username to sign in to RStudio, and paste the password. This will now bring up RStudio.
The password is recreated whenever you are logged out of RStudio and reopen the Databricks tab.

Quitting a session or signing out

RStudio on the DASH Platform with logging out buttons highlighted on the top right

If you close the RStudio window but keep other tabs in the browser open, and then reopen RStudio through Databricks, your username and password will be saved and it takes you straight back to RStudio.

To sign out of RStudio on the DASH Platform, click on the Sign out icon next to your username at the top right hand side of your screen. You will need to enter your username and password again to get into RStudio, but your console and environment won’t be cleared. Please sign out when you are finished working in RStudio on the platform.

If you click on the orange Quit current R session icon also at the top right, you will close the current session, including clearing your console and environment.

R version and packages

R version and preinstalled packages

The version of R that is available on the DASH Platform depends on the runtime of Databricks that is being used at a given time. At the time of writing, the runtime is 13.3 LTS ML, which gives an R version of 4.2.2. To check the current runtime, go back to Databricks to see the RStudio cluster, which tells you the runtime in the first tab, see image below.

View of RStudio cluster on the DASH Platform with databricks runtime highlighted on the bottom

The R packages that are pre-installed also depend on the runtime. For a full list of packages, see the release notes of runtime 10.4 LTS ML, which includes many commonly used packages such as tidyverse or readxl.

Another way to see the packages that are installed is to go to the bottom right pane in RStudio and click on Packages, which will show both the User Library with packages you have installed yourself, and System Library with all pre-installed packages.

When the runtime of Databricks is updated on the DASH Platform, so will the list and versions of R packages.

Installing packages from CRAN

You can also install further packages from CRAN in the usual ways, by typing the code below into the console and pressing Enter:

install.packages("janitor")

However, the package may not be installed permanently and you might need to reinstall it at some point - usually after the RStudio cluster has been restarted. You can request for additional packages to be added more formally, so you don’t have to reinstall them. Use the issue tracker on Teams to request additional packages.

Installing packages from GitHub

If you want to install packages from GitHub, you can do so by changing the code below to the package you want to install. You need to change the URL so it contains the package author and package name:

install.packages("https://github.com/PACKAGE_AUTHOR/PACKAGE_NAME/archive/master.tar.gz",
                 repos = NULL,
                 type = "source",
                 method = 'wget',
                 extra = '--no-check-certificate')

You can also install using the renv package, again changing the code below to the actual package author and package name:

libcurl_opts <- list(
  download.file.method = "libcurl",
  download.file.extra = " --insecure"
)
old_opt <- options(libcurl_opts)
renv::install("PACKAGE_AUTHOR/PACKAGE_NAME")

RStudio workspace on the DASH Platform

All of your projects, scripts and outputs are saved into your R workspace in the DASH Platform, see image below. When you start, this space should be empty, but any files should appear there in the future. This space is separate from where data are hosted on the platform in general, and it can only be accessed by you.

Screenshot of RStudio workspace on the DASH Platform

It is important that you back up any files you keep in your DASH Platform R workspace, and the recommendation is to do so using GitHub to host your code and outputs. You can also export file to your local machine. Otherwise you may lose your work if there is a restart of the RStudio cluster. Save any data in your folder in the lab zone as detailed below.

Accessing the DASH Platform data from RStudio

You can access the details of the DASH Platform data catalogue on SharePoint. To view available datasets, select Available in the DASH Platform (Data loaded in Public Beta/MVP2)‘=’Y’.

You can also find the file paths from within the DASH Platform You need the file paths to load data into RStudio from within the platform, see video below.

  • To see what data are held within the DASH Platform, go out of RStudio back to Databricks, then click on Data.
  • This will show you the Databricks File System (DBFS), where the data on the platform is held, and the associated file paths.
  • Click on DBFS at the top, then mnt > migrated-landing > General Access.
  • You can see some of the files that are held on the DASH Platform and explore the other folders.

Working with the DASH Platform’s data in RStudio

You can load data in the usual way, create an output, and save it into your personal files section. To load data from the DASH Platform, the file path needs to start with /dbfs/mnt/ and then the rest of the file path, see this example of loading data and creating a simple figure:

## the tidyverse is pre-installed, so you can load it directly without installing it:
library(tidyverse) 

## Now load the data:
prices <- read_csv("/dbfs/mnt/migrated-landing/General Access/AgriPricing/API-csv-10dec20.csv")

## Investigate the data:
head(prices)

## Only retain rows where variable category begins with an f:
prices <- filter(prices, str_detect(category, "^f"))

## Create an output: 
ggplot(data = prices, aes(x = category, y = index)) +
  geom_boxplot() +
  coord_flip()

## Save the output:
ggsave("figure1.png") 

Output from previous code - boxplot of agricultural index and different categories

Since you haven’t specified a particular place for the figure to be saved to, it should appear on the right of your RStudio screen under Files, see image below.

RStudio workspace on the DASH Platform, with files section highlighted

You can also export files by ticking the box left to the file name, then More > Export…, and the file will be downloaded into your Downloads folder on your local machine.

Saving data from RStudio to the DASH Platform

To save your data into the lab zone of the platform, you first need to create your own folder that you can save data to, which will in the unrestricted data folder in the lab zone. This means anyone can access your data there, so do not use this for restricted data.

Use the following line of code to create your folder, but replace the email address with your own Defra or ALB email address:

dir.create("/dbfs/mnt/lab/unrestricted/firstname.lastname@defra.gov.uk")

You can test that this worked, by saving the prices file into your folder:

write_csv(prices, "/dbfs/mnt/lab/unrestricted/firstname.lastname@defra.gov.uk/prices.csv")

To not clog up your data folder, you can delete the prices file:

unlink("/dbfs/mnt/lab/unrestricted/firstname.lastname@defra.gov.uk/prices.csv")

You can go back to the Data section in Databricks and check that your folder is there, and whether the csv file has been saved and removed successfully.

Uploading files into your RStudio workspace

  • You can upload your own files directly into your R workspace on the DASH Platform
  • On the right hand side under Files, select Upload.
  • In the pop-up, select Choose file, and then find your local file to upload.

The file will appear under the R folder in Files, or you can create your own folders in here and save the file to that folder. Uploading files in this way is useful for example for images that you want to insert into RMarkdown files.

The files you load into your R workspace are not backed up, therefore it is recommended that you back them up in some way, for example via GitHub.

Please do not use this uploading option for data that you need to work with regularly, but save it into your folder in the lab zone instead. Saving data that you only need for one session, for example for doing a training course, should be Ok however.

Backing up work from your RStudio workspace

Whenever the RStudio cluster you are using is restarted, all your files are wiped from your RStudio workspace. We will let you know ahead of scheduled cluster restarts, but it is also possible that the cluster needs to be restarted unexpectedly.

Therefore you need to back up your work on a regular basis. The recommended way is to do so via GitHub.

If you can’t back up your files via GitHub, it is also possible to download scripts and outputs such as figures to your local machine.

  • On the right hand side under Files, select the files you want to download to your local machine
  • Click on More > Export…
  • Rename the files if required, then click Download
  • The files will appear in your Downloads folder on your local machine