1

I'm trying to achieve this query using scala in spark sql

SELECT * FROM  employees
 WHERE emp_id IN (SELECT emp_id 
              FROM employees
              WHERE SALARY > 200) ;

I have mapped oracle table to spark data frame

val employees = sqlContext.load("jdbc", Map("url" -> "jdbc:oracle:thin:client/password@localhost:1521:orcldb1", "dbtable" -> "client.EMPLOYEE"));

Output of subquery is following

scala> employees.where($"salary" > 100).select($"emp_id").collect().map{ row=>row.get(0)}

res3: Array[Any] = Array(6, 7, 8, 9, 10, 4, 2, 3, 5)

I'm getting following error while performing whole query

 employees.where($"emp_id" in (employees.where($"salary" > 100).select($"emp_id").collect())).show

warning: there were 1 deprecation warning(s); re-run with -deprecation for details java.lang.RuntimeException: Unsupported literal type class [Ljava.lang.Object; [Ljava.lang.Object;@129df247 at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:49) at org.apache.spark.sql.functions$.lit(functions.scala:89) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

Ishan
  • 680
  • 1
  • 8
  • 19
  • 1
    No that wasn't my expected answer [this](http://stackoverflow.com/questions/32551919/spark-filter-isin-doesnt-work-as-expected) is the one. – Ishan Nov 23 '15 at 13:08
  • Some other options: http://stackoverflow.com/a/33825337/1560062 – zero323 Nov 23 '15 at 13:19

1 Answers1

1

i got the answer

val items = employees.where($"salary" > 100).select($"emp_id").collect().map{ row=>row.get(0)}

the only thing i changed is, i have used isin instead of in

employees.filter($"emp_id".isin(items:_*)).show
Ishan
  • 680
  • 1
  • 8
  • 19