0

I receive and error while calling udf from within withColumn in Spark using Scala. This error happens while building with SBT.

val hiveRDD = sqlContext.sql("select * from iac_trinity.ctg_us_clickstream")
hiveRDD.persist()

val trnEventDf = hiveRDD
  .withColumn("system_generated_id", getAuthId(hiveRDD("session_user_id")))
  .withColumn("application_assigned_event_id", hiveRDD("event_event_id"))


val getAuthId = udf((session_user_id:String) => {
    if (session_user_id != None){
        if (session_user_id != "NULL"){
            if (session_user_id != "null"){
            session_user_id
          }else "-1"
        }else "-1"
    }else "-1"
  }

)

I receive the error which is -

scala:58: No TypeTag available for String
val getAuthId = udf((session_user_id:String) => {

It compiles properly when instead of (session_user_id:String) I use (session_user_id:Any) but fails in runtime as Any is not recognized in Spark. Please let me know how to handle this.

preitam ojha
  • 239
  • 1
  • 2
  • 7

1 Answers1

1

Have you tried being explicit with your types?

udf[String, String]((session_user_id:String)...
Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
  • Yes , I have tried being explicit - val getAuthId = udf[String,String]((session_user_id:String) => if (session_user_id == None) .... the error is the same - scala:57: No TypeTag available for String [error] val getAuthId = udf[String,String]((session_user_id:String) => if (session_user_id == None)"-1" – preitam ojha Jul 04 '16 at 03:18
  • 1
    @preitamojha are you sure you are executing the same code your are giving us ? It seems unlikely that this doesn't work. I can't reproduce the error. – eliasah Jul 04 '16 at 06:17