0

I'm having a Spark dataframe tbl_pred with the folowing factor column:

**Value**    
13,3
11
5,3

I like to convert those 'strings' to numeric values. I can use the as.numeric function, but this doesn't work because my seperator is a comma.

tbl_pred <- tbl_bun %>% mutate(value = as.numeric(value))

Normally I would use the sub function to replace the , to a . but this function does not work on my Spark dataframe object.

Error: org.apache.spark.sql.AnalysisException: Undefined function: 'SUB'. This function is neither a registered temporary function nor a permanent function registered in the database 'xxx'.; line 1 pos 417

Does someone have a solution for converting the values to a numeric?

Thanks in advance,

J.

user3331966
  • 152
  • 2
  • 9

1 Answers1

1

regexp_replace is the function you need here:

tbl_bun %>% mutate(value=as.numeric(regexp_replace(value, ",", "\\.")))

When in doubt see Hive Language Manual UDF. Pretty much every function there either has native Spark implementation or is exposed as a Hive UDF.

zero323
  • 322,348
  • 103
  • 959
  • 935