File paths with drake on a shared drive

Question

I am encountering some odd drake behaviour which I just can't figure out. I am trying to add a .rmd to my drake plan. I am working on a remote machine AND on a network drive on that machine. If I try to add an .rmd file to my plan like this:

> library(drake)
> library(rmarkdown)
> 
> list.files()
[1] "drake_testing.Rproj"        "foo.png"             "report.Rmd"                    
> 
> plan <- drake_plan(
+   png("foo.png"),
+   plot(iris$Sepal.Length ~ iris$Sepal.Width),
+   dev.off(),
+   report = render(
+     input = knitr_in("report.Rmd"),
+     output_file = "report.html",
+     quiet = TRUE
+   )
+   
+ )
> 
> plan
# A tibble: 4 x 2
  target         command                                                                               
  <chr>          <expr>                                                                                
1 drake_target_1 png("foo.png")                                                                        
2 drake_target_2 plot(iris$Sepal.Length ~ iris$Sepal.Width)                                            
3 drake_target_3 dev.off()                                                                             
4 report         render(input = knitr_in("report.Rmd"), output_file = "report.html",      quiet = TRUE)
> 
> ## Turn your plan into a set of instructions
> config <- drake_config(plan)
Error: The specified file is not readable: report.Rmd
> 
> traceback()
13: stop(txt, obj, call. = FALSE)
12: .errorhandler("The specified file is not readable: ", object, 
        mode = errormode)
11: digest::digest(object = file, algo = config$hash_algorithm, file = TRUE, 
        serialize = FALSE)
10: rehash_file(file, config)
9: rehash_storage(target = target, file = file, config = config)
8: FUN(X[[i]], ...)
7: lapply(X = X, FUN = FUN, ...)
6: weak_mclapply(X = keys, FUN = FUN, mc.cores = jobs, ...)
5: lightly_parallelize_atomic(X = X, FUN = FUN, jobs = jobs, ...)
4: lightly_parallelize(X = knitr_files, FUN = storage_hash, jobs = config$jobs, 
       config = config)
3: cdl_get_knitr_hash(config)
2: create_drake_layout(plan = plan, envir = envir, verbose = verbose, 
       jobs = jobs_preprocess, console_log_file = console_log_file, 
       trigger = trigger, cache = cache)
1: drake_config(plan)

I have tried the following permutations to make this work:

Move the .rmd to the local drive and call it with the full path to there
Add in file.path inside and outside of knitr_in to complete a full path.
Try using file_in for each of the scenarios above.

I have also tried debugging but I get a little lost when drake turns the file name into a hash then turns it back into the basename of the file (i.e. report.Rmd). The error ultimately happens when digest::digest is called.

Does anyone have experience attempting to figure out something like this?

If you write knitr_in("report.Rmd") in your plan, drake expects report.Rmd to exist in the working directory where you call make() or r_make() or drake_config(). If the report is on a remote drive, knitr_in() needs a literal file path accessible from your working directory. — landau, Aug 20 '19 at 21:17
If that does not help, would you elaborate on where report.Rmd lives and where you are calling drake_config() from? — landau, Aug 20 '19 at 21:19
So everything happens on a remote drive (including the working directory). I set up a very simple drake project locally on my machine then replicated it exactly on the remote machine. That is the report.Rmd lives in the same place in both configurations but in the remote one I get the error outlined above. That is, it is definitely in the working directly. — boshek, Aug 20 '19 at 22:57
In case I'm not clear, the `report.Rmd` lives in the exact same directory as where I call `make.R`. — boshek, Aug 20 '19 at 23:17
Good to know. Would you be willing to post a traceback and a full [`reprex`](https://github.com/tidyverse/reprex)? Even if the bug is specific to your configuration, this would me follow the full story. — landau, Aug 21 '19 at 01:42
Another thing: can you reproduce the problem with `file_in()` instead of `knitr_in()`? This will help narrow down the list of possibilities. — landau, Aug 21 '19 at 01:44
And before I forget, `drake_config()` takes a plan rather than a script. Perhaps you meant `drake_config(plan)` instead of `drake_config("plan.R")`? — landau, Aug 21 '19 at 01:44
@landau I've posted a better reprex and tried your suggestions. Each results in the same error. And yes definitively I mean `drake_config(plan)`. Does this help at all? — boshek, Aug 22 '19 at 15:40
Thanks, this helps. In that same `reprex`, what happens if you call `digest::digest(object = "report.Rmd", algo = "xxhash64", file = TRUE, serialize = FALSE)` directly? Trying to figure out whether this is `drake`'s responsibility. — landau, Aug 22 '19 at 15:53

score 2 · Accepted Answer · answered Aug 22 '19 at 16:03

2

I think the answer depends on whether you get the same error when you call digest("report.Rmd", file = TRUE) on its own outside drake_config(plan). If it errors (which I am betting it does) there may be something strange about your file system that clashes with R. If that is the case, then there is unfortunately nothing drake can do.

I also suggest some changes to your plan:

plan <- drake_plan(
  plot_step = {
    png(file_out("foo.png")),
    plot(iris$Sepal.Length ~ iris$Sepal.Width),
    dev.off()
  },
  report = render(
    input = knitr_in("report.Rmd"),
    output_file = "report.html",
    quiet = TRUE
  )  
)

Or better yet, compartmentalize your work in reusable functions:

plot_foo = function(filename) {
  png(filename),
  plot(iris$Sepal.Length ~ iris$Sepal.Width),
  dev.off()
}

plan <- drake_plan(
  foo = plot_foo(file_out("foo.png")),
  report = render(
    input = knitr_in("report.Rmd"),
    output_file = "report.html",
    quiet = TRUE
  )  
)

A target is a skippable workflow step with a meaningful return value and/or output file(s). png() and dev.off() are part of the plotting step, and file_out() tells drake to watch foo.png for changes. Also, it is good practice to name your targets. Usually, the return values of targets are meaningful, just like variables in R.

answered Aug 22 '19 at 16:03

landau

5,636
1
22
50

Thank you for this help. Re: "something strange about your file system that clashes with R". Are there any tips there? It is just weird because I am able to read in `.csv`s and I haven't encountered any issues before. Moreover I am able to run the exact code outside of the plan. – boshek Aug 22 '19 at 16:31
As predicted `digest::digest(object = "report.Rmd", algo = "xxhash64", file = TRUE, serialize = FALSE)` failed in the same way. – boshek Aug 22 '19 at 16:32
I think we made progress. Maybe a new Stack Overflow question with a new set of tags? Unfortunately, I do not know why digest is throwing that error. – landau Aug 22 '19 at 16:35
I agree - just so I am clear, you are proposing a question that just tackles the `digest::digest` issue? – boshek Aug 22 '19 at 16:44
Yes... maybe try and see if readLines(“report.Rmd”) works first. – landau Aug 22 '19 at 17:23
`readLines("report.Rmd")` DOES work outside of the plan but I then tried it inside the plan and it resulted in the same error as before. – boshek Aug 22 '19 at 21:49
So to confirm: digest() fails both inside and outside the plan, and readLines() fails inside the plan but succeeds outside the plan? If so, does the traceback say the failure is at readLines() itself? – landau Aug 22 '19 at 23:52
This is correct and the traceback with `readLines` when called within the plan matches that posted above exactly. – boshek Aug 23 '19 at 04:26
1

Okay good, then I think a new post is the next step: readLines() works, file.exists() is TRUE, digest() errors out. 90Also, for what it is worth, when the file does not exist at all, it says "Error: The file does not exist: report.Rmd". So in your case, it looks like digest() is still finding the file. And it does not seem like a permissions issue since readLines() works. – landau Aug 23 '19 at 09:44
1

Thanks @landau! I definitely think the issue is something to do with the algo argument as leaving it with the default results in a successful hash creation. – boshek Aug 23 '19 at 15:45
Ok at the last pass I seem to have figured out a workaround which seems to contradict what I stated above. I created a new function which takes the file in the working directory as an argument, copies to the temp directory then returns that temp path which digest/render then reads from. I imagine that this is a bit of a issue as that hash I think will change each time that file has to be recopied over but at least I can get this to run now. – boshek Aug 23 '19 at 16:24
For what it is worth, this issue was discussed over here for digest with some implications for drake: https://github.com/eddelbuettel/digest/issues/49 – boshek Feb 28 '20 at 23:47

File paths with drake on a shared drive

1 Answers1