-1

I want to programmatically give a certain number of fields and for some fields, select a column and pass that field to another function that will return a case class of string, string. So far I have

val myList = Seq(("a", "b", "c", "d"), ("aa", "bb", "cc","dd"))

val df = myList.toDF("col1","col2","col3","col4")

val fields= "col1,col2"

val myDF = df.select(df.columns.map(c => if (fields.contains(c)) { df.col(s"$c") && someUDFThatReturnsAStructTypeOfStringAndString(df.col(s"$c")).alias(s"${c}_processed") } else { df.col(s"$c") }): _*)

Right now this is giving me the exception

org.apache.spark.sql.AnalysisException: cannot resolve '(col1 AND UDF(col1))' due to data type mismatch: differing types in '(col1 AND UDF(col1))' (string and struct< STRING1:string,STRING2:string > )

I want to select

col1 | < col1.String1, col1.String2 > | col2 | < col2.String1,col2.String2 > | col3 | col4

"a" | < "a1", "a2" > | "b" | < "b1", "b2" > | "c" | "d"

uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64

1 Answers1

0

I ended up using the df.selectExpr and tying together a bunch of expressions.

       import spark.implicits._
       val fields = "col1,col2".split(",")


       val exprToSelect = df.columns.filter(c => fields.contains(c)).map(c => s"someUDFThatReturnsAStructTypeOfStringAndString(${c}) as ${c}_parsed") ++ df.columns

       val exprToFilter = df.columns.filter(c => fields.contains(c)).map(c => s"length(${c}_parsed.String1) > 1").reduce(_ + " OR " + _) //error
       val exprToFilter2 = df.columns.filter(c => fields.contains(c)).map(c => s"(length(${c}_parsed.String1) < 1)").reduce(_ + " AND " + _) //valid
       val exprToSelectValid = df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String2 as ${c}") ++ df.columns.filterNot(c => fields.contains(c)) //valid
       val exprToSelectInValid = Array("concat(" + df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String1").mkString(", ") + ") as String1") ++ df.columns 

       val parsedDF = df.select(exprToSelect.map { c => expr(s"$c")}: _ *)

       val validDF = parsedDF.filter(exprToFilter2)
                             .select(exprToSelectValid.map { c => expr(s"$c")}: _ *)

       val errorDF  =  parsedDF.filter(exprToFilter)
                               .select(exprToSelectInValid.map { c => expr(s"$c")}: _ *)
uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64