Correlated Scalar SubQueries

Question

I am using SparkSQL. I have the following table:

Department:

dep_id | emp_id | age
0 | 0 | 21
0 | 1 | 53
0 | 3 | 42
0 | 5 | 58
1 | 4 | 36
1 | 8 | 48
2 | 2 | 54
2 | 6 | 31
2 | 7 | 30
2 | 9 | 54

However, I am not being able to successfully execute the following query in SparkSQL:

SELECT   A.dep_id,
         A.emp_id,
         A.age,
         (SELECT  MAX(age) FROM department B WHERE A.dep_id = B.dep_id) max_age
FROM     department A
ORDER BY 1,2

Can anyone please help where I may be going wrong.

Thanks

You should store year of birth instead of age, because the age values tend to change all the time. — jarlh, Jan 13 '20 at 09:45
@GMB - the error is `org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: max(age)#19163` — marie20, Jan 14 '20 at 04:39

score 1 · Answer 1 · answered Jan 13 '20 at 10:00

1

There is no need for a subquery, you can do this with window functions:

SELECT   dep_id,
         emp_id,
         age,
         MAX(age) OVER(PARTITION BY dep_id) max_age
FROM     department 
ORDER BY 1,2

answered Jan 13 '20 at 10:00

GMB

216,147
25
84
135

I understand that this can be done with window functions. However, I need to be able to do the same with scalar correleated subqueries only. – marie20 Jan 14 '20 at 04:39

score 0 · Answer 2 · answered Jan 13 '20 at 09:52

0

You should declare table alias :

SELECT A.dep_id, A.emp_id, A.age,
      (SELECT MAX(b.age) FROM department B WHERE A.dep_id = B.dep_id) max_age
FROM department A;

answered Jan 13 '20 at 09:52

Yogesh Sharma

49,870
5
26
52

the query fails with the error - `org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: max(age)#19163` – marie20 Jan 14 '20 at 04:37

Correlated Scalar SubQueries

2 Answers2