0

I have DataFrame 'res' in sparkR. 'res' contains ID and dates. So the first entries looks like this 'ID' = 1 2 3 ... and 'date' = "2012-6-5", "2013-5-5", "2015-10-11" ...

I want to create a new dataset where all 'dates' is subtracted with "2010-01-01". How can this be done? I have the exact same problem if I just wanted all elements in a DataFrame subtracted with an integer.

In sparkR I tried this

newres <- withColumn(res, "subtract", res$date - as.Date("2010-01-01") )

This run but when I type head(newres) I get an error:mesage: "returnstatus==0 is not True."

Ole Petersen
  • 670
  • 9
  • 21

1 Answers1

1

In your previous question (Convert string to date in sparkR) I read that type conversion was not an issue to perform in R rather than SparkR. In my setting I can convert everything to an integer in R and make the subtraction in SparkR as follows:

df <- data.frame(user_id=c(1,1,2,2),
                time=c("2015-7-10","2015-8-04","2015-8-8","2015-7-10"))

df$time <- as.Date(df$time)
df$time <- as.numeric(df$time)

date <- as.numeric(as.Date("2010-01-01"))

res <- createDataFrame(sqlContext, df)

newRes <- withColumn(res, "subtract",res$time - date)

collect(newRes)

which gives me

  user_id  time subtract
1       1 16626     2016
2       1 16651     2041
3       2 16655     2045
4       2 16626     2016

I hope this works, since you said you had problems with integer subtraction as well... The only "problem" with this solution is the time conversion in R: now you're restricted to DataFrames which fit entirely in memory of your R environment.

Community
  • 1
  • 1
Wannes Rosiers
  • 1,680
  • 1
  • 12
  • 18