moving transformations from hive sql query to Spark

Question

val temp = sqlContext.sql(s"SELECT A, B, C, (CASE WHEN (D) in (1,2,3) THEN ((E)+0.000)/60 ELSE 0 END) AS Z from TEST.TEST_TABLE")
val temp1 = temp.map({ temp => ((temp.getShort(0), temp.getString(1)), (USAGE_TEMP.getDouble(2), USAGE_TEMP.getDouble(3)))})
.reduceByKey((x, y) => ((x._1+y._1),(x._2+y._2)))

instead of the above code which is doing the computation(case evaluation) on hive layer I would like to have the transformation done in scala. How would I do it?

Is it possible to do the same while filling the data inside Map?

`withColumn`method is another approach apart from `map` method suggested by sarvesh below — Ram Ghadiyaram, Sep 22 '16 at 09:05

score 1 · Accepted Answer · answered Aug 22 '16 at 09:46

val temp = sqlContext.sql(s"SELECT A, B, C, D, E from TEST.TEST_TABLE")

val tempTransform = temp.map(row => {
  val z = List[Double](1, 2, 3).contains(row.getDouble(3)) match {
    case true => row.getDouble(4) / 60
    case _ => 0
  }
  Row(row.getShort(0), Row.getString(1), Row.getDouble(2), z)
})

val temp1 = tempTransform.map({ temp => ((temp.getShort(0), temp.getString(1)), (USAGE_TEMP.getDouble(2), USAGE_TEMP.getDouble(3)))})
  .reduceByKey((x, y) => ((x._1+y._1),(x._2+y._2)))

score 0 · Answer 2 · edited May 23 '17 at 12:19

you can use this syntax as well

new_df = old_df.withColumn('target_column', udf(df.name))

as reffered by this example

val sqlContext = new SQLContext(sc)
import sqlContext.implicits._ // for `toDF` and $""
import org.apache.spark.sql.functions._ // for `when`

val df = sc.parallelize(Seq((4, "blah", 2), (2, "", 3), (56, "foo", 3), (100, null, 5)))
    .toDF("A", "B", "C")

val newDf = df.withColumn("D", when($"B".isNull or $"B" === "", 0).otherwise(1))

In your case, execute sql which be dataframe like below val temp = sqlContext.sql(s"SELECT A, B, C, D, E from TEST.TEST_TABLE")

and apply withColumn with case or when otherwise or if needed spark udf

, call scala function logic instead of hiveudf

moving transformations from hive sql query to Spark

2 Answers2