How to pass column values in where condition

Question

I am failing to achieve simple SQL query in Spark.

I would like to write the below query in Scala Spark:

select * from emp where emp_id in (select distinct manager_id from emp ;

Below is what I tried:

empdf.where(col("emp_id").isin(empdf.select("manager_id").collect().map(_(0)).toList)).show()

I got the below error:

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(null, 68319, 68319, 68319, 65646, 65646, 69062, 66928, 66928, 66928, 66928, 67858, 66928, 67832)

mck · Accepted Answer · 2021-02-23T12:59:57.817

It's better to do a semi join to avoid collecting as list:

empdf.alias("t1").join(empdf.alias("t2"), expr("t1.emp_id = t2.manager_id"), "left_semi")

If you want to use isin, you can expand the List using : _* (see this post):

empdf.where(col("emp_id").isin(empdf.select("manager_id").collect().map(_(0)).toList: _*)).show()

Or use isInCollection:

empdf.where(col("emp_id").isInCollection(empdf.select("manager_id").collect().map(_(0)).toList)).show()

score 0 · Answer 2 · answered Feb 23 '21 at 18:29

0

You can also write as a SQL,

empdf.createorreplaceview("emp")

spark.sql("select * from emp where emp_id in (select distinct manager_id from emp ")

answered Feb 23 '21 at 18:29

Murali Krishna

41
4

How to pass column values in where condition

2 Answers2