1

My requirement is as below

Joined two data frames as below:

     var c = a.join(b,keys,"fullouter")

c.printSchema() as below:

     |-- add: string (nullable = true)
     |-- sub: string (nullable = true)
     |-- delete: string (nullable = true)
     |-- mul: long (nullable = true)
     |-- ADD: string (nullable = true)
     |-- SUB: string (nullable = true)
     |-- DELETE: string (nullable = true)
     |-- MUL: long (nullable = true)
      It's good until here.

Now i am doing a withcolumn when condition as below

     val d = c.withColumn("column", when(c("a.add") === c("b.ADD"), 
   "Neardata"))

error as below:

    Exception in thread "main" org.apache.spark.sql.AnalysisException: 
    Cannot resolve column name "a.add"

I tried as below also

     val d = c.withColumn("column", when(col("a.add") === col("b.ADD"), "Neardata"))

    Again error.

   Please suggest.
sparkscala
  • 91
  • 1
  • 11
  • could you add `c.printSchema()` output into the question? – notNull Apr 22 '20 at 20:37
  • Here you go :|-- add: string (nullable = true) |-- sub: string (nullable = true) |-- delete: string (nullable = true) |-- mul: long (nullable = true) |-- ADD: string (nullable = true) |-- SUB: string (nullable = true) |-- DELETE: string (nullable = true) |-- MUL: long (nullable = true) – sparkscala Apr 22 '20 at 20:50
  • Dear @sparkscala have you defined aliases to 2 dataframe before you refer them in when ? – Ram Ghadiyaram Apr 22 '20 at 21:05

1 Answers1

0

You have to define alias with datframe.as("a") and dataframe1.as("b")

Example :


  import spark.sqlContext.implicits._
  val data = List(("James","","Smith","36636","M",60000),
    ("Michael","Rose","","40288","M",70000),
    ("Robert","","Williams","42114","",400000),
    ("Maria","Anne","Jones","39192","F",500000),
    ("Jen","Mary","Brown","","F",0))

  val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
  val df = spark.createDataFrame(data).toDF(cols:_*).as("a")
  val df2 = df.withColumn("a.new_gender", when(col("a.gender") === "M","Male")
    .when(col("a.gender") === "F","Female")
    .otherwise("Unknown")).show

Output :

+----------+-----------+---------+-----+------+------+------------+
|first_name|middle_name|last_name|  dob|gender|salary|a.new_gender|
+----------+-----------+---------+-----+------+------+------------+
|     James|           |    Smith|36636|     M| 60000|        Male|
|   Michael|       Rose|         |40288|     M| 70000|        Male|
|    Robert|           | Williams|42114|      |400000|     Unknown|
|     Maria|       Anne|    Jones|39192|     F|500000|      Female|
|       Jen|       Mary|    Brown|     |     F|     0|      Female|
+----------+-----------+---------+-----+------+------+------------+

I think with out alias you are trying to access like this... that might be the cause.

  val df2 = df.withColumn("df.new_gender", when(col("df.gender") === "M","Male")
    .when(col("df.gender") === "F","Female")
    .otherwise("Unknown")).show

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Tried above and i am getting the below error Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit () – sparkscala Apr 22 '20 at 23:35
  • what is the version of spark and scala – Ram Ghadiyaram Apr 22 '20 at 23:52
  • basically of you are using string in when otherwise you have to enclose with lit("") for ex : `lit("Neardata")` for that you need to import org.apache.sql.functions._ – Ram Ghadiyaram Apr 22 '20 at 23:56
  • can you print the code here, where you are getting error ? – Ram Ghadiyaram Apr 22 '20 at 23:56
  • I have defined alias for a & b and the code is as below val d = c.withColumn("column", when(c("a.add") === c("b.ADD"), "Neardata")) spark 2.3 scala 2.11.8 – sparkscala Apr 23 '20 at 00:08
  • try lit("Neardata") this is another issue not reated to aliases – Ram Ghadiyaram Apr 23 '20 at 00:10
  • `df.withColumn("column", when(df("a.first_name") === df("b.first_name"), "Neardata")) }` is right.... you are getting this error some where else what is complete stacktrace ? – Ram Ghadiyaram Apr 23 '20 at 00:15
  • Sorry about that. Yes c is a dataframe. The actual code is as below val d =c.withColumn("column", when(c("a.add") === c("b.ADD"), "Neardata") .when(c("a.add").isNull(), "INSERT")) . Your response is very much appreciated. – sparkscala Apr 23 '20 at 00:23
  • `df.withColumn("column", when(df("a.first_name") === df("a.first_name"), "Neardata") .when(df("a.middle_name").isNull, "INSERT")).show ` will work no error for me... – Ram Ghadiyaram Apr 23 '20 at 00:33
  • `+----------+-----------+---------+-----+------+------+--------+ |first_name|middle_name|last_name| dob|gender|salary| column| +----------+-----------+---------+-----+------+------+--------+ | James| | Smith|36636| M| 60000|Neardata| | Michael| Rose| |40288| M| 70000|Neardata| | Robert| | Williams|42114| |400000|Neardata| | Maria| Anne| Jones|39192| F|500000|Neardata| | Jen| Mary| Brown| | F| 0|Neardata| +----------+-----------+---------+-----+------+------+--------+ ` – Ram Ghadiyaram Apr 23 '20 at 00:33
  • try to follow the example given above. – Ram Ghadiyaram Apr 23 '20 at 00:34
  • it's working fine till df.withColumn("column", when(df("a.first_name") === df("a.first_name"), "Neardata") after that if i add the other statement .when(df("a.middle_name").isNull, "INSERT")) it's failing with the same error Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit () – sparkscala Apr 23 '20 at 00:40
  • its not possible as per above it should work. do one thing make it 2 steps 1) val df1 = df.withColumn("column", when(df("a.first_name") === df("a.first_name"), "Neardata")) 2) ` df1.withColumn("column",when(df("a.middle_name").isNull, "INSERT"))` – Ram Ghadiyaram Apr 23 '20 at 01:01
  • Splitted into 2 dataframes and both are failing . It's really strange. If i have only df1 then it's working. if i add df2 both are failing.Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit () – sparkscala Apr 23 '20 at 01:29
  • https://stackoverflow.com/questions/53374838/unsupported-literal-type-class-scala-runtime-boxedunit I think u have same isue check this – Ram Ghadiyaram Apr 23 '20 at 03:03
  • if you are using isNull() then remove () it will be isNull – Ram Ghadiyaram Apr 23 '20 at 03:08
  • please cultivate the habbit of accepting the answer if you find useful. – Ram Ghadiyaram Apr 24 '20 at 19:47