8

I want to get summary data of the first quartile for a table in Hive. Below is a query to get the maximum number of views in each quartile:

SELECT NTILE(4) OVER (ORDER BY total_views) AS quartile, MAX(total_views)
FROM view_data
GROUP BY quartile
ORDER BY quartile;

And this query is to get the names of all the people that are in the first quartile:

SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data
WHERE quartile = 1

I get this error for both queries:

Invalid table alias or column reference 'quartile'

How can I reference the ntile results in the where clause or group by clause?

Nadine
  • 1,620
  • 2
  • 15
  • 27

2 Answers2

7

You can't put a windowing function in a where clause because it would create ambiguity if there are compound predicates. So use a subquery.

select quartile, max(total_views) from
(SELECT total_views, NTILE(4) OVER (ORDER BY total_views) AS quartile,
FROM view_data) t
GROUP BY quartile
ORDER BY quartile
;

and

select * from 
(SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data) t
WHERE quartile = 1
;
invoketheshell
  • 3,819
  • 2
  • 20
  • 35
  • I don't really understand what you mean by ambiguity caused by compound predicates, could you give an example please? – Nadine Jul 21 '15 at 15:03
  • 1
    SELECT col1 FROM T1 WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3 AND col1 > '100' What would the order of operation be here? Should the col1 > '100' part of the predicate be run first or should the row numbering. – invoketheshell Jul 21 '15 at 15:04
-1

The WHERE statement in SQL can only select on an existing column in a table schema. In order to perform that functionality on a calculated column, use HAVING instead of WHERE.

SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data
HAVING quartile = 1