I've been looking to aggregate values present in different chunks in the xdf file, but I'm unable to get it to work.
Would any of you have a code snippet where you've used any apply function inside of a transform in an rxDataStep?
I've been looking to aggregate values present in different chunks in the xdf file, but I'm unable to get it to work.
Would any of you have a code snippet where you've used any apply function inside of a transform in an rxDataStep?
Apply a transform function using transformFunc. You have to have packages you need installed on the worker nodes. Use transformObjects to give functions to the transformFunc.
xformFunction <- function(data) {
require(dplyr)
df <- as.data.frame(data)
df <- dplyr::summarise(dplyr::group_by(df, z))
return(df)
}
rxDataStep(inData = input_xdf, outFile = t_xdf, transformFunc = xformFunction, transformPackages = c("dplyr"), overwrite = TRUE)
Aggregation will be on the node, so you will get duplicate z values when using Spark ComputeContext.