The RStudio server on DASH is an version of RStudio the you use in a web browser that runs on a Databricks cluster.
This guide is aimed at users who are familiar with R and RStudio, but new to the DASH Platform.
This user guide shows how to access RStudio from within the DASH Platform PROD environment, how to close RStudio, and work with data from within the DASH Platform.
Use your email address as the username to sign in to RStudio, and
paste the password. This will now bring up RStudio.
The password is recreated whenever you are logged out of RStudio and
reopen the Databricks tab.
If you close the RStudio window but keep other tabs in the browser open, and then reopen RStudio through Databricks, your username and password will be saved and it takes you straight back to RStudio.
To sign out of RStudio on the DASH Platform, click on the Sign out icon next to your username at the top right hand side of your screen. You will need to enter your username and password again to get into RStudio, but your console and environment won’t be cleared. Please sign out when you are finished working in RStudio on the platform.
If you click on the orange Quit current R session icon also at the top right, you will close the current session, including clearing your console and environment.
The version of R that is available on the DASH Platform depends on the runtime of Databricks that is being used at a given time. At the time of writing, the runtime is 13.3 LTS ML, which gives an R version of 4.2.2. To check the current runtime, go back to Databricks to see the RStudio cluster, which tells you the runtime in the first tab, see image below.
The R packages that are pre-installed also depend on the runtime. For
a full list of packages, see the release
notes of runtime 10.4 LTS ML, which includes many commonly
used packages such as tidyverse
or readxl
.
Another way to see the packages that are installed is to go to the bottom right pane in RStudio and click on Packages, which will show both the User Library with packages you have installed yourself, and System Library with all pre-installed packages.
When the runtime of Databricks is updated on the DASH Platform, so will the list and versions of R packages.
You can also install further packages from CRAN in the usual ways, by typing the code below into the console and pressing Enter:
install.packages("janitor")
However, the package may not be installed permanently and you might need to reinstall it at some point - usually after the RStudio cluster has been restarted. You can request for additional packages to be added more formally, so you don’t have to reinstall them. Use the issue tracker on Teams to request additional packages.
If you want to install packages from GitHub, you can do so by changing the code below to the package you want to install. You need to change the URL so it contains the package author and package name:
install.packages("https://github.com/PACKAGE_AUTHOR/PACKAGE_NAME/archive/master.tar.gz",
repos = NULL,
type = "source",
method = 'wget',
extra = '--no-check-certificate')
You can also install using the renv
package, again
changing the code below to the actual package author and package
name:
libcurl_opts <- list(
download.file.method = "libcurl",
download.file.extra = " --insecure"
)
old_opt <- options(libcurl_opts)
renv::install("PACKAGE_AUTHOR/PACKAGE_NAME")
All of your projects, scripts and outputs are saved into your R workspace in the DASH Platform, see image below. When you start, this space should be empty, but any files should appear there in the future. This space is separate from where data are hosted on the platform in general, and it can only be accessed by you.
It is important that you back up any files you keep in your DASH Platform R workspace, and the recommendation is to do so using GitHub to host your code and outputs. You can also export file to your local machine. Otherwise you may lose your work if there is a restart of the RStudio cluster. Save any data in your folder in the lab zone as detailed below.
You can access the details of the DASH Platform data catalogue on SharePoint. To view available datasets, select Available in the DASH Platform (Data loaded in Public Beta/MVP2)‘=’Y’.
You can also find the file paths from within the DASH Platform You need the file paths to load data into RStudio from within the platform, see video below.
You can load data in the usual way, create an output, and save it into your personal files section. To load data from the DASH Platform, the file path needs to start with /dbfs/mnt/ and then the rest of the file path, see this example of loading data and creating a simple figure:
## the tidyverse is pre-installed, so you can load it directly without installing it:
library(tidyverse)
## Now load the data:
prices <- read_csv("/dbfs/mnt/migrated-landing/General Access/AgriPricing/API-csv-10dec20.csv")
## Investigate the data:
head(prices)
## Only retain rows where variable category begins with an f:
prices <- filter(prices, str_detect(category, "^f"))
## Create an output:
ggplot(data = prices, aes(x = category, y = index)) +
geom_boxplot() +
coord_flip()
## Save the output:
ggsave("figure1.png")
Since you haven’t specified a particular place for the figure to be saved to, it should appear on the right of your RStudio screen under Files, see image below.
You can also export files by ticking the box left to the file name, then More > Export…, and the file will be downloaded into your Downloads folder on your local machine.
To save your data into the lab zone of the platform, you first need to create your own folder that you can save data to, which will in the unrestricted data folder in the lab zone. This means anyone can access your data there, so do not use this for restricted data.
Use the following line of code to create your folder, but replace the email address with your own Defra or ALB email address:
dir.create("/dbfs/mnt/lab/unrestricted/firstname.lastname@defra.gov.uk")
You can test that this worked, by saving the prices
file
into your folder:
write_csv(prices, "/dbfs/mnt/lab/unrestricted/firstname.lastname@defra.gov.uk/prices.csv")
To not clog up your data folder, you can delete the prices file:
unlink("/dbfs/mnt/lab/unrestricted/firstname.lastname@defra.gov.uk/prices.csv")
You can go back to the Data section in Databricks and check that your folder is there, and whether the csv file has been saved and removed successfully.
The file will appear under the R folder in Files, or you can create your own folders in here and save the file to that folder. Uploading files in this way is useful for example for images that you want to insert into RMarkdown files.
The files you load into your R workspace are not backed up, therefore it is recommended that you back them up in some way, for example via GitHub.
Please do not use this uploading option for data that you need to work with regularly, but save it into your folder in the lab zone instead. Saving data that you only need for one session, for example for doing a training course, should be Ok however.
Whenever the RStudio cluster you are using is restarted, all your files are wiped from your RStudio workspace. We will let you know ahead of scheduled cluster restarts, but it is also possible that the cluster needs to be restarted unexpectedly.
Therefore you need to back up your work on a regular basis. The recommended way is to do so via GitHub.
If you can’t back up your files via GitHub, it is also possible to download scripts and outputs such as figures to your local machine.