I am using sparklyr(0.8.4)
in RStudio to connect to my remote spark environment through livy, and noticed that it took about 3~5 minutes for sparklyr to establish a session.
sc <- sparklyr::spark_connect(master="https://myremotelivy", method="livy")
But, when I am connecting to the same cluster via sparkmagic
(in a jupyter notebook), through the same livy endpoint, I am seeing less than minute in which a sparkR
session context is returned.
I understand that sparklyr
is very different than sparkR
in terms of how it works with the remote system (ie sparklyr leverages sparkQL), and maybe this is not a fair comparison.
Can anyone share any insights as to why it takes much more time to establish this session thru RStudio? and are there configuration parameters (livy, spark, or RStudio) that can help with interactions thru sparklyr to makes it less slow? Even executing a simple x <- tbl(sc, "mytable")
took about ~15 seconds..
Thanks much.