0

I'm trying to speed up my R code using future package by using mutlicore plan on Linux. In future definition I'm creating a java object and trying to pass it to .jcall(), But I'm getting a null value for java object in future. Could anyone please help me out to resolve this. Below is sample code-

    library("future")
    plan(multicore)

    library(rJava)
    .jinit()

    # preprocess is a user defined function
    my_value <- preprocess(a = value){
      # some preprocessing task here
      # time consuming statistical analysis here
      return(lreturn) # return a list of 3 components
    }
    obj=.jnew("java.custom.class")

    f <- future({
      .jcall(obj, "V", "CustomJavaMethod", my_value)
    })

Basically I'm dealing with large streaming data. In above code I'm sending the string of streaming data to user defined function for statistical analysis and returning the list of 3 components. Then want to send this list to custom java class [ java.custom.class ]for further processing using custom Java method [ CustomJavaMethod ]. Without using future my code is running fine. But I'm getting 12 streaming records in one minute and then my code is getting slow, observed delay in processing.

Currently I'm using Unix with 16 cores. After using future package my process is done fast. I have traced back my code, in .jcall something happens wrong.

Hope this clarifies my pain.

HenrikB
  • 6,132
  • 31
  • 34
Krishna Nevase
  • 95
  • 2
  • 5
  • 12

2 Answers2

3

(Author of the future package here:)

Unfortunately, there are certain types of objects in R that cannot be sent to another R process for further processing. To clarify, this is a limitation to those type of objects - not to the parallel framework use (here the future framework). This simplest example of such an objects may be a file connection, e.g. con <- file("my-local-file.txt", open = "wb"). I've documented some examples in Section 'Non-exportable objects' of the 'Common Issues with Solutions' vignette (https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html).

As mentioned in the vignette, you can set an option (*) such that the future framework looks for these type of objects and gives an informative error before attempting to launch the future ("early stopping"). Here is your example with this check activated:

library("future")
plan(multisession)

## Assert that global objects can be sent back and forth between
## the main R process and background R processes ("workers")
options(future.globals.onReference = "error")

library("rJava")
.jinit()

end <- .jnew("java/lang/String", " World!")

f <- future({
  start <- .jnew("java/lang/String", "Hello")
  .jcall(start, "Ljava/lang/String;", "concat", end)
})

# Error in FALSE : 
#  Detected a non-exportable reference ('externalptr') in one of the
#  globals ('end' of class 'jobjRef') used in the future expression

So, yes, your example actually works when using plan(multicore). The reason for that is that 'multicore' uses forked processes (available on Unix and macOS but not Windows). However, I would try my best to limit your software to parallelize only on "forkable" systems; if you can find an alternative approach I would aim for that. That way your code will also work on, say, a huge cloud cluster.

(*) The reason for these checks not being enabled by default is (a) it's still in beta testing, and (b) it comes with overhead because we basically need to scan for non-supported objects among all the globals. Whether these checks will be enabled by default in the future or not, will be discussed over at https://github.com/HenrikBengtsson/future.

HenrikB
  • 6,132
  • 31
  • 34
  • Thanks for the help. Still not clear how to export such non-exportable objects in parallel-processing. – Krishna Nevase Apr 25 '18 at 03:55
  • Unfortunately, they are not exportable by default. The author/developers of those structures/classes need to write seralize/unserialize functions. It's the same problem that they won't work if you save them (e.g. `saveRDS()`), restart R, and then read them back in (e.g. `readRDS()`). Until they support that, there's nothing much you can do. – HenrikB Apr 25 '18 at 16:58
  • Thanks! I have updated my question with more details. Could you please help in that. – Krishna Nevase Apr 25 '18 at 17:03
  • In latest version of R 3.5.0, they resolved this problem using custom serialization of ALTREP framework objects. – Krishna Nevase Apr 25 '18 at 17:16
  • I don't think the serialization done in R 3.5.0 related to ALTREP is relevant here, but I might be wrong. – HenrikB Apr 26 '18 at 00:01
0

The code in the question is calling unknown Method1 method, my_value is undefined, ... hard to know what you are really trying to achieve.

Take a look at the following example, maybe you can get inspiration from it:

library(future)
plan(multicore)

library(rJava)
.jinit()

end = .jnew("java/lang/String", " World!")

f <- future({
  start = .jnew("java/lang/String", "Hello")
  .jcall(start, "Ljava/lang/String;", "concat", end)
})

value(f)
[1] "Hello World!"
Juan Mellado
  • 14,973
  • 5
  • 47
  • 54