In the sparklyr
tutorial I'm following it says I can use compute()
to store the results of the preceding dplyr
statement into a new spark data frame.
The code in 'code 1' creates a new spark data frame called "NewSparkDataframe" and a spark_tbl
is created which I assigned to "NewTbl". I can view the spark data frame using src_tbls()
. This is all as expected.
If I instead run 'code 2' without using compute()
it still creates a spark_tbl
which I again assign to "NewTbl". This time though I'm unable to view the new spark data frame in spark using src_tbls()
.
I'm wondering how "NewTbl" is able to run the spark_tbl
in code 2 if there's apparently no "NewSparkDataframe" in spark?
Also what is the point in using compute()
if I can still access the same newly created spark_tbl
with "NewTbl"?
code 1:
NewTbl <- mySparkTbl %>%
some dplyr statements %>%
compute("NewSparkDataframe")
src_tbls(spark_conn)
"NewSparkDataframe"
code 2:
NewTbl <- mySparkTbl %>%
some dplyr statements
src_tbls(spark_conn)