0

I want to specify a default custom CRAN mirror in R under Databricks, but adjusting the config in the Rprofile.site file seems not to bet recognized at all.

I have already read the official Microsoft documentation on how to customize the R session in Databricks:
https://learn.microsoft.com/en-us/azure/databricks/sparkr/#r-session-customization
The value of R_HOME is /usr/lib/R

So, I have adjusted my Databricks cluster-scoped init script which adds following lines to the /usr/lib/R/etc/Rprofile.site file:

local({
  options(repos = c(CRAN = "<my_custom_cran_url>"))
})

This works perfectly fine. However if I run getOption("repos") within an R Notebook I get following output:

                         Cloud                           MRAN 
"https://cloud.r-project.org/"  "https://cran.microsoft.com/" 

These are still the initial default CRAN settings. This means, that they weren't overwritten by my custom CRAN URL in the Rprofile.site file.

If I run the lines mentioned above (local({...repos...})) in an R Notebook, the getOption("repos") will output the desired entry of:

                CRAN 
"<my_custom_cran_url>"

Maybe the file /usr/lib/R/etc/Rprofile.site is not executed at all although Microsoft is saying so? Does anyone have a suggestion?

The Databricks Runtime version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)

1 Answers1

2

TL;DR :- Use the undocumented DATABRICKS_DEFAULT_R_REPOS environment variable and set the value to a ':' delimited list of repo URLs

For example

DATABRICKS_DEFAULT_R_REPOS=file:///dbfs/FileStore/miniCRAN:https://cloud.r-project.org/

I've also hit the same issue and I can confirm that /usr/lib/R/etc/Rprofile.site is being executed. Setting the option with a different name in Rprofile.site will show up in the Notebook.

The issue is there's another R profile script (/local_disk0/tmp/_CleanRShell.*.r) that is execute after Rprofile.site overwriting any repos options. Luckily this code is control by the DATABRICKS_DEFAULT_R_REPOS environment variable.

anth0ny-x
  • 46
  • 2
  • Thank you so much anth0ny-x! The only problem I'm still facing is that my repo URL looks like this: `https://:@`. In the script under `/local_disk0/tmp/_CleanRShell.*.r` I can see that the colon ":" acts as an URL separator. So it will separate my URL after the username. The only way I see to fix this is to edit the `_CleanRShell.*.r` script to use a different separator (for e.g. semicolon ";") – Niklas Letz Aug 22 '23 at 12:41