33

I'm trying to retrieve top 2 tables from my employee list based on salary in hive (version 0.11). Since it doesn't support TOP function, is there any alternatives? Or do we have define a UDF?

mck
  • 40,932
  • 13
  • 35
  • 50
Holmes
  • 1,059
  • 2
  • 17
  • 25

3 Answers3

73

Yes, here you can use LIMIT.

You can try it by the below query:

SELECT * FROM employee_list SORT BY salary DESC LIMIT 2
Forseth11
  • 1,418
  • 1
  • 12
  • 21
4

Here I think it's worth mentioning SORT BY and ORDER BY both clauses and why they different,

SELECT * FROM <table_name> SORT BY <column_name> DESC LIMIT 2

If you are using SORT BY clause it sort data per reducer which means if you have more than one MapReduce task it will result partially ordered data. On the other hand, the ORDER BY clause will result in ordered data for the final Reduce task. To understand more please refer to this link.

SELECT * FROM <table_name> ORDER BY <column_name> DESC LIMIT 2

Note: Finally, Even though the accepted answer contains SORT BY clause, I mostly prefer to use ORDER BY clause for the general use case to avoid any data loss.

INDRAJITH EKANAYAKE
  • 3,894
  • 11
  • 41
  • 63
3
select * from employee_list order by salary desc limit 2;
Ashish Doneriya
  • 1,683
  • 1
  • 13
  • 18