1

I have two tables

table1:

id
1
2
3

table 2:

id date
1   x1
4   x2
1   x3
3   x4
3   x5
1   x6
3   x5
6   x6
6   x5
3   x6

I want the count of each ids for table 2 that is present in table 1.

Result

id  count
1   3
2   0
3   4

I am using this query, but its giving me error:

SELECT tab2.id, count(tab2.id)
 FROM <mytable2> tab2
 GROUP BY tab2.id
 WHERE tab2.id IN (select id from <mytable1>)
 ;

Error is:

missing EOF at 'WHERE' near 'di_device_id'

ekad
  • 14,436
  • 26
  • 44
  • 46
user2200660
  • 1,261
  • 3
  • 18
  • 23

2 Answers2

3

There are two possible issues. Sub queries in the WHERE clause are only supported from Hive 0.13 and up. If you are using such a version, then your problem is just that you have WHERE and GROUP BY the wrong way round:

SELECT tab2.id, count(tab2.id)
 FROM <mytable2> tab2
 WHERE tab2.id IN (select id from <mytable1>)
 GROUP BY tab2.id
 ;

If you are using an older version of Hive then you need to use a JOIN:

SELECT tab2.id, count(tab2.id)
 FROM <mytable2> tab2 INNER JOIN <mytable1> tab1 ON (tab2.id = tab1.id)
 GROUP BY tab2.id
 ;
mattinbits
  • 10,370
  • 1
  • 26
  • 35
3

You have two issues :-

  1. Where comes before group by. In SQL syntax you use having to filter after grouping by!
  2. Hive doesn't support all types of nested queries in Where clause. See here: Hive Subqueries

However yours type of sub query will be ok. Try this:-

SELECT tab2.id, count(tab2.id)
 FROM <mytable2> tab2
 WHERE tab2.id IN (select id from <mytable1>)
 GROUP BY tab2.id;

It will do exactly same thing what you meant.

Edit: I Just checked @MattinBit's answer. I didn't intended to duplicate the answer. His answer is more complete!

Mangat Rai Modi
  • 5,397
  • 8
  • 45
  • 75