0

I am using Spark 1.6

The below udf is used to clean address data.

sqlContext.udf.register("cleanaddress", (AD1:String,AD2: String, AD3:String)=>Boolean = _.matches("^[a-zA-Z0-9]*$"))

UDF Name : cleanaddress Three input parameter is coming from DataFrame column,(AD1,AD2 and AD3).

May someone please help me to fix the below error.

i am trying to write udf which accept three parameter (3 address column of dataframe), compute and give only the filter records.

Error:
Error:(38, 91) reassignment to val
    sqlContext.udf.register("cleanaddress", (AD1:String, AD2: String, AD3:String)=>Boolean = _.matches("^[a-zA-Z0-9]*$"))
Sophie Dinka
  • 73
  • 1
  • 8

1 Answers1

0

Your logic is not quite clear fro your given code. What you can do is to return an array of valid addresses like this:

sqlContext.udf.register("cleanaddress", (AD1:String, AD2: String, AD3:String)=> Seq(AD1,AD2,AD3).filter(_.matches("^[a-zA-Z0-9]*$")))

Note that this will return a complex column (i.e. an array)

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • Hello Sir, i am trying to do something like this. passing function to each column . sqlContext.udf.register("cleanaddress", (AD:String)=> Seq(AD).filter(_.matches("^[a-zA-Z0-9]*$"))), but if value is non alpha numeric than get replaced by NULL or empty string. Please help me if we can modify the function. Many Thanks – Sophie Dinka Sep 20 '19 at 05:37