I am relatively new to SPARKR. I downloaded SPARK 1.4 and setup RStudio to use SPARKR library. However I want to know how I can apply a function to each value in a column of a distributed DataFrame, can someone please help? For example,
This works perfectly
myFunc <- function(x) { paste(x , "_hello")}
c <- c("a", "b", "c")
d <- lapply(c, myFunc)
How to make this work for a Distributed DataFrame. The intention is to append "_hello" to each value of column Name of DF
DF <- read.df(sqlContext, "TV_Flattened_2.csv", source = "com.databricks.spark.csv", header="true")
SparkR:::lapply(DF$Name, myFunc)
In the alpha version of SPARKR before SPARK 1.4 release there seems to have been this ability, why is this now missing in SPARK 1.4 official release?