3

I am trying to install the prophet package to Databricks. I want to install it directly to my cluster rather than my notebook. Below is the following code to install it to the notebook:

Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
remotes::install_github("jeroen/V8")
devtools::install_version("rstantools", version = "2.0.0")
install.packages('prophet')

However, I want to download it directly to my cluster. How would I add this snippet of code to install the prophet package to my Databricks cluster?

Here are the options I see when attempting to install a package to a cluster:

enter image description here

Attempt at downloading directly to cluster:

Command 1

%python
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")

Command 2

%python
dbutils.fs.put("/databricks/scripts/prophet_install_script.R","""
Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
remotes::install_github(\"jeroen/V8\")
devtools::install_version(\"rstantools\", version = \"2.0.0\")
install.packages('prophet')
""", True)

Command 3

%python
dbutils.fs.put("/databricks/scripts/stock_cluster_init_script_v1.sh","""
#!/bin/bash
R CMD BATCH /dbfs/databricks/scripts/prophet_install_script.R
""", True)

Then I went to my new cluster and ran it with this init script:

enter image description here enter image description here

It then provided me the following error:

{
  "reason": {
    "code": "INIT_SCRIPT_FAILURE",
    "type": "CLIENT_ERROR",
    "parameters": {
      "instance_id": "i-0c71b23287fb81530",
      "databricks_error_message": "Cluster scoped init script dbfs:/databricks/scripts/stock_cluster_init_script_v1.sh failed: Script exit status is non-zero"
    }
  }
}
nak5120
  • 4,089
  • 4
  • 35
  • 94

1 Answers1

2

If you aren't on the community edition, then you can use the cluster init script to perform this installation (you can install other libraries there as well).

Just put R commands into a file on DBFS (see linked docs to see how to use dbutils.fs.put for that - you also need to explicitly set CRAN mirror):

local({r <- getOption("repos")
       r["CRAN"] <- "http://cran.r-project.org" 
       options(repos=r)
})
Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
remotes::install_github(\"jeroen/V8\")
devtools::install_version(\"rstantools\", version = \"2.0.0\")
install.packages('prophet')

and then create init script with following content:

#!/bin/bash

Rscript --verbose  /dbfs/<path-to-file>

please note that <path-to-file> should be withouth dbfs:

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks! Should I save the R commands to notepad as an R file/Upload it directly to Databricks and then call dbutils.fs.put function on the new file path - `#!/bin/bash R CMD BATCH /dbfs/`? – nak5120 Nov 09 '21 at 19:37
  • You can use `dbutils.fs.put` for R commands as well. Or you can use editor to create the init script - both things are the same at end effect – Alex Ott Nov 09 '21 at 19:39
  • I provided my attempt in the original question and it failed to run. Any idea what I may be doing wrong? Appreciate the help – nak5120 Nov 10 '21 at 14:33
  • use `/databricks/scripts/prophet_install_script.r` instead of `/databricks/scripts/prophet_install_script/` and `/databricks/scripts/stock_cluster_init_script.sh` instead of `/databricks/scripts/stock_cluster_init_script/` – Alex Ott Nov 10 '21 at 15:06
  • Thanks, I updated the script and still got an error. Any idea why this may still be happening? – nak5120 Nov 10 '21 at 15:23
  • I updated the attempt in the question. – nak5120 Nov 10 '21 at 15:23
  • @nak5120 - I've updated answer. The difference was to use `Rscript`, plus explicitly select CRAN site – Alex Ott Nov 10 '21 at 16:25
  • 1
    it worked, thank you! – nak5120 Nov 10 '21 at 17:11