column is not a member of org.apache.spark.sql.DataFrame

Question

I am new to spark and I am trying to join two tables present in hive from Scala code:

import org.apache.spark.sql._
import sqlContext.implicits._

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

val csp = hiveContext.sql("select * from csp")
val ref = hiveContext.sql("select * from ref_file")

val csp_ref_join = csp.join(ref, csp.model_id == ref.imodel_id , "LEFT_OUTER")

however for the above join I got error :

<console>:54: error: value model_id is not a member of org.apache.spark.sql.DataFrame
         val csp_ref_join = csp.join(ref, csp.model_id == ref.imodel_id , "LEFT_OUTER")

Is it a right way to join the hive tables if not what went wrong?

one more question ... joins on hive tables in Scala vs same joins in hive which one is better approach considering performance? is it right way to do it in Scala with hiveContext?

thanks in advance!!

oh im sorry i had changed my requirement bcz i wanted to resolve it asap i will un del it thanks for help :) — ashwini, Aug 10 '18 at 04:23

score 4 · Accepted Answer · answered Jun 01 '18 at 19:04

Since you use Scala you cannot use dot syntax. Also it is === not ==

csp.join(ref_file, csp("model_id") === ref_file("icmv_model_id"), "leftouter")

or (if there are no name conflicts):

csp.join(ref_file, $"model_id" === $"icmv_model_id", "leftouter")

or (under the same conditions as above):

import org.apache.spark.sql.functions.col

csp.join(ref_file, col("model_id") === col("icmv_model_id"), "leftouter")

column is not a member of org.apache.spark.sql.DataFrame

1 Answers1