Row number functionality in Hive

Question

How can I generate row numbers for an existing table while running a select query?
For example:

select row_number(), * from emp;

I am using hive 0.13. I can't access external jars or udfs in my environment. The underlying files are in parquet format.

Thanks in advance!

If you try to run this kind of analytic functions on large data sets (i.e. over 50 million rows) then be careful to test your data consistency. I have seen subtle **data corruption** occur in a **deterministic way** with V0.13 and V0.14 -- the row numbers were in sequence but some thousands of rows had been dropped and replaced by a copy of other rows. But that may be specific to Hive-on-TEZ. — Samson Scharfrichter, May 28 '16 at 11:32

score 36 · Answer 1 · answered May 27 '16 at 18:04

36

ROW_NUMBER() is a windowing function so it needs to be used in conjunction with an OVER clause. Just don't specify any PARTITION.

SELECT *, ROW_NUMBER() OVER () AS row_num
FROM emp
--- other stuff

answered May 27 '16 at 18:04

o-90

17,045
10
39
63

11

appears to need an explicit ordering at least in my version of hiveql, e.g. `SELECT *, ROW_NUMBER() OVER (ORDER BY some_emp_field) AS row_num FROM emp` – patricksurry Nov 02 '17 at 12:27

sumitya · Answer 2 · 2021-09-10T11:24:39.790

16

row_number() can be used to find for example, recent visit of a user on your site.

SELECT user_id,user_name,timestamp
FROM (
SELECT user_id,user_name,timestamp,row_number() over (partition by userid order by timestamp desc) as visit_number 
from user) user_table
    WHERE visit_number = 1

edited Sep 10 '21 at 11:24

answered May 27 '16 at 19:52

sumitya

2,631
1
19
32

How does this relate to the OP's question? – o-90 May 27 '16 at 22:32
@GoBrewers14 - I have added one more layer of explanation to my answer , generate row_number and make some sense out of it. Hope this helps :) – sumitya May 27 '16 at 22:43
1

Please consider adding comment , how this answer can be improved? – sumitya Jun 18 '16 at 05:45
4

At least useful to me. – Zhongzhi Feb 11 '17 at 21:44
Subqueries and row_number should never be together – MikeKulls Feb 05 '21 at 01:29

Row number functionality in Hive

2 Answers2

Linked