75

I have a table with 1000+ partitions.

"Show partitions" command only lists a small number of partitions.

How can i show all partitions?

Update:

  1. I found "show partitions" command only lists exactly 500 partitions.

  2. "select ... where ..." only processes the 500 partitions!

Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
Kevin Leo
  • 850
  • 1
  • 7
  • 9

5 Answers5

96

CLI has some limit when ouput is displayed. I suggest to export output into local file:

$hive -e 'show partitions table;' > partitions
www
  • 4,365
  • 1
  • 23
  • 24
  • 1
    Same result as CLI. And it only shows 500 partitions. I don't known where is the magic number 500 from. – Kevin Leo Mar 25 '13 at 14:03
  • 1
    No idea then. It's very weird. Keep us posted if you solve it or find source of it. GL! Maybe naive but are you sure that there is more than 500 partitions? – www Mar 25 '13 at 14:57
  • 11
    Problem solved by "set cassandra.connection.sliceSize=10000;". Maybe it's a datastax's externsion to Hive. – Kevin Leo Mar 27 '13 at 15:20
  • is it feasible to show the created time of partition? – colintobing Jul 20 '17 at 04:18
  • 1
    @colintobing its feasible when you query from the metastore_db – Sridhar Sep 12 '17 at 11:37
  • 1
    Why is this the most voted answer when, as the comments show, it still shows only 500 partitions? – GuSuku Jan 25 '18 at 17:15
26

hive> show partitions table_name;

Sri
  • 491
  • 5
  • 9
4

Okay, I'm writing this answer by extending wmky's answer above & also, assuming that you've configured mysql for your metastore instead of derby.

select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='<table_name>');

The above query gives you all possible values of the partition columns.

Example:

hive> desc clicks_fact;
OK
time                    timestamp                                   
..                              
day                     date                                        
file_date               varchar(8)                                  

# Partition Information      
# col_name              data_type               comment             

day                     date                                        
file_date               varchar(8)                                  
Time taken: 1.075 seconds, Fetched: 28 row(s)

I'm going to fetch the values of partition columns.

mysql> select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='clicks_fact');
+-----------------------------------+
| PART_NAME                         |
+-----------------------------------+
| day=2016-08-16/file_date=20160816 |
| day=2016-08-17/file_date=20160816 |
....
....
| day=2017-09-09/file_date=20170909 |
| day=2017-09-08/file_date=20170909 |
| day=2017-09-09/file_date=20170910 |
| day=2017-09-10/file_date=20170910 |
+-----------------------------------+

1216 rows in set (0.00 sec)

Returns all partition columns.

Note: JOIN table DBS ON DB_ID when there is a DB involved (i.e, when, multiple DB's have same table_name)

Sridhar
  • 1,518
  • 14
  • 27
3

You can see Hive MetaStore tables,Partitions information in table of "PARTITIONS". You could use "TBLS" join "Partition" to query special table partitions.

youchuikai
  • 49
  • 3
0

Yet another option is to communicate with Hive Metastore via Thrift protocol.
If you write code in python, you may benefit from hmsclient library:

Hive cli:

hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);
OK
Time taken: 0.127 seconds

hive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);
OK
Time taken: 0.152 seconds

Python cli:

>>> from hmsclient import hmsclient
>>> client = hmsclient.HMSClient(host='hive.metastore.location', port=9083)
>>> with client as c:
...    all_partitions = c.get_partitions(db_name='default',
...                                      tbl_name='test_table_with_partitions', 
...                                      max_parts=24 * 365 * 3)
...
>>> print([{'dt': part.values[0]} for part in all_partitions])
[{'dt': '20210504'}, {'dt': '20210505'}]

NB: max_parts is a parameter than cannot be greater than 32767 (java short max value).

If you have Airflow installed together with apache.hive extra, you create hmsclient quite easy:

hive_hook = HiveMetastoreHook()
with hive_hook.metastore as hive_client:
    ... your code goes here ...

This seems a more valid way of communication with Hive Metastore than accessing DB directly (and database engine agnostic BTW).

GoodDok
  • 1,770
  • 13
  • 28