I am working with sparklyr and am having trouble changing column classes along with using dplyr to aggregate the data. This is my code currently:
.libPaths(c(.libPaths(), '/usr/lib/spark/R/lib'))
Sys.setenv(SPARK_HOME = "/usr/lib/spark")
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
library(sparklyr)
library(dplyr)
library(magrittr)
sc <- sparkR.session(master = "xxxxx")
df <- read.df("path", "csv", header = "true", inferSchema = "true", na.strings = "NA")
df1<-select(df, df$DATE, df$Subject, df$Source, df$Cost, df$Test)
DATE Subject Source Cost Test
1 11/8/2016 07gjAAAAAAAq AAAA_MOAAAGRAAAAA 2 2
2 11/8/2016 07gjAAAAAAAq BBBB_MOBBB4BBB2 7 7
3 11/8/2016 07gjAAAAAAAq BBBB_MOBICCCCCCCCC14 2 2
4 11/8/2016 07gjAAAAAAAq SCCT_MOBIDDDDDDDDD14 1 1
5 11/8/2016 07gjAAAAAAAq REET_MOBBBBBBBB01 2 1
6 11/8/2016 07gjAAAAAAAq SCCT_MRRRF4RR22 11 11
Two questions based on this:
1) How do I change the DATE column to a date class. The way I did it in the past was:
df1$DATE<-as.Date(df1$DATE,'%m/%d/%Y')
This was the error:
Error in as.Date.default(df1$DATE, "%m/%d/%Y") :
do not know how to convert 'df1$DATE' to class “Date”
Any help would be great, thanks!