25

If I write a hive sql like

ALTER TABLE tbl_name ADD PARTITION (dt=20131023) LOCATION 'hdfs://path/to/tbl_name/dt=20131023;

How can I query this location about partition later? Because I found there is some data in location but I can't query them, hive sql like

SELECT data FROM tbl_name where dt=20131023;
MoreFreeze
  • 2,856
  • 3
  • 24
  • 34
  • how can one list locations of all the partitions not just one? – morpheus Jun 12 '17 at 17:27
  • 1
    @morpheus I use a for loop with `show partitions table;` to show locations of all partitions. I don't find a one-line command to implement this. – MoreFreeze Jun 14 '17 at 02:37

8 Answers8

57

Do a describe on the partition instead of the full table.
This will show the linked location if it's an external table.

describe formatted tbl_name partition (dt='20131023')
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
SamirK
  • 571
  • 1
  • 4
  • 3
22
show table extended like 'tbl_name' partition (dt='20131023');

Show Tables/Partitions Extended

SHOW TABLE EXTENDED will list information for all tables matching the given regular expression. Users cannot use regular expression for table name if a partition specification is present. This command's output includes basic table information and file system information like totalNumberFiles, totalFileSize, maxFileSize, minFileSize, lastAccessTime, and lastUpdateTime. If partition is present, it will output the given partition's file system information instead of table's file system information.

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
16

If you have multiple nested partitions, the syntax is:

describe formatted table_name partition (day=123,hour=2);
chaqke
  • 1,497
  • 17
  • 23
2

If you want to know the location of files you're reading, use

SELECT INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE FROM <table> WHERE <part_name> = '<part_key>'

Then you get

hdfs:///user/hive/warehouse/<db>/<table>/<part_name>=<part_key>/000000_0.snappy, 0
hdfs:///user/hive/warehouse/<db>/<table>/<part_name>=<part_key>/000000_1.snappy, 0
0

This is the format of the command I use to get the exact HDFS location of a specific partition in a specific table:

show table extended like flight_context_fused_record partition(date_key='20181013', partition_id='P-DUK2nESsv', custom_partition_1='ZMP');

In the command above, the partition spec consists of three separate fields. Your example may have more or less.

See results below. Notice the "location:" field shows the HDFS folder location.

hive (nva_test)> show table extended like flight_context_fused_record partition(date_key='20181013', partition_id='P-DUK2nESsv', custom_partition_1='ZMP');
OK
tableName:flight_context_fused_record
owner:nva-prod
location:hdfs://hdp1-ha/tmp/vfisher/cms-context-acquisition-2019-06-13/FlightContextFusedRecord/2018/10/13/ZMP/P-DUK2nESsv
inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
columns:struct columns { string primary_key, string facility, string position, i32 dalr_channel, i64 start_time_unix_millis, i64 end_time_unix_millis, string foreign_key_to_audio_segment, struct<on_frequency_flight_list:list<struct<acid:string,ac_type:string>>,transfer_list:list<struct<primary_key:string,acid:string,data_id:string,ac_type:string,from_facility:string,from_position:string,transition_time:i64,transition_time_start:i64,transtition_time_end:i64,to_facility:string,to_position:string,source:string,source_info:string,source_time:i64,confidence:double,confidence_description:string,uuid:string>>,source_list:list<string>,domain:string,domains:list<string>> flight_context}
partitioned:true
partitionColumns:struct partition_columns { i32 date_key, string partition_id, string custom_partition_1}
totalNumberFiles:1
totalFileSize:247075687
maxFileSize:247075687
minFileSize:247075687
lastAccessTime:1561122938361
lastUpdateTime:1561071155639

The generic form of the command (taking out my specific values and putting in argument specifiers) looks like this:

show table extended like <your table name here> partition(<your partition spec here>);
vfisher
  • 41
  • 4
0

you can simply do this:

DESC FORMATTED tablename PARTITION (yr_no='y2019');

OR

DESC EXTENDED tablename PARTITION (yr_no='y2019');
user3327034
  • 395
  • 3
  • 13
0

You can get the location of the Hive partitions on HDFS by running any of the following Hive commands.

DESCRIBE FORMATTED tbl_name  PARTITION(dt=20131023);
SHOW TABLE EXTENDED LIKE tbl_name PARTITION(dt=20131023);

Alternatively, you can also get by running HDFS list command

hdfs dfs -ls <your Hive store location>/<tablename>

Link: Hive show or list all partitions

Thanks, NNK

NNK
  • 1,044
  • 9
  • 24
0

You can get this info via Hive Metastore Thrift protocol, e.g. with hmsclient library:

Hive cli:

hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);
OK
Time taken: 0.127 seconds

hive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);
OK
Time taken: 0.152 seconds

Python cli:

>>> with client as c:
...     partition = c.get_partition_by_name(db_name='default', 
                                            tbl_name='test_table_with_partitions',
                                            part_name='dt=20210504')
... 
>>> partition.sd.location
'hdfs://hdfs.master.host:8020/user/hive/warehouse/test_table_with_partitions/dt=20210504'
GoodDok
  • 1,770
  • 13
  • 28