I am new to spark and I am trying to join two tables present in hive from Scala code:
import org.apache.spark.sql._
import sqlContext.implicits._
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val csp = hiveContext.sql("select * from csp")
val ref = hiveContext.sql("select * from ref_file")
val csp_ref_join = csp.join(ref, csp.model_id == ref.imodel_id , "LEFT_OUTER")
however for the above join I got error :
<console>:54: error: value model_id is not a member of org.apache.spark.sql.DataFrame
val csp_ref_join = csp.join(ref, csp.model_id == ref.imodel_id , "LEFT_OUTER")
Is it a right way to join the hive tables if not what went wrong?
one more question ... joins on hive tables in Scala vs same joins in hive which one is better approach considering performance? is it right way to do it in Scala with hiveContext?
thanks in advance!!