0

I am using SparkSQL. I have the following table:

Department:

dep_id | emp_id | age
0 | 0 | 21
0 | 1 | 53
0 | 3 | 42
0 | 5 | 58
1 | 4 | 36
1 | 8 | 48
2 | 2 | 54
2 | 6 | 31
2 | 7 | 30
2 | 9 | 54

However, I am not being able to successfully execute the following query in SparkSQL:

SELECT   A.dep_id,
         A.emp_id,
         A.age,
         (SELECT  MAX(age) FROM department B WHERE A.dep_id = B.dep_id) max_age
FROM     department A
ORDER BY 1,2

Can anyone please help where I may be going wrong.

Thanks

marie20
  • 723
  • 11
  • 30
  • 1
    Are you getting an error? Which one? – GMB Jan 13 '20 at 09:43
  • 1
    You should store year of birth instead of age, because the age values tend to change all the time. – jarlh Jan 13 '20 at 09:45
  • @GMB - the error is `org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: max(age)#19163` – marie20 Jan 14 '20 at 04:39

2 Answers2

1

There is no need for a subquery, you can do this with window functions:

SELECT   dep_id,
         emp_id,
         age,
         MAX(age) OVER(PARTITION BY dep_id) max_age
FROM     department 
ORDER BY 1,2
GMB
  • 216,147
  • 25
  • 84
  • 135
  • I understand that this can be done with window functions. However, I need to be able to do the same with scalar correleated subqueries only. – marie20 Jan 14 '20 at 04:39
0

You should declare table alias :

SELECT A.dep_id, A.emp_id, A.age,
      (SELECT MAX(b.age) FROM department B WHERE A.dep_id = B.dep_id) max_age
FROM department A;
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
  • the query fails with the error - `org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: max(age)#19163` – marie20 Jan 14 '20 at 04:37