0

What is the fastest way to check if a table has any records in Hive?

I so far have come across these approaches:

  1. Do a SELECT count(*) FROM <table_name>, I find this to be slow.
  2. Do a show tblproperties <db.table_name>("numRows");, I find that these give -1 if ANALYZE TABLE isn't run on table before. Hence would require ANALYZE TABLE .. to be run before SHOW TBLPROPERTIES ..
  3. Do a SELECT * FROM <table_name> limit 1. I find this to be the most efficient way.

Are there better ways to do this? (I just want to check if Hive table has at least one record)

Ani Menon
  • 27,209
  • 16
  • 105
  • 126

1 Answers1

0

This is as far as I know:

  • Hive table is partitioned:

    1) find the location of the table

    desc formatted <tablename>
    

    2) compute the file size in hdfs

    hdfs dfs -du -h <location of table>
    
  • Hive is not partitioned:

    1) show tblproperties <db.table_name>

    2) find numRows

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
jose_bacoy
  • 12,227
  • 1
  • 20
  • 38
  • First approach, I don't find this to be as fast as `SELECT * FROM limit 1` as this is a two step process and you have to fire two queries for the result. Second approach, as mentioned in the question this has down-sides. – Ani Menon Mar 06 '18 at 17:38
  • This is what it is. Thanks. – jose_bacoy Mar 06 '18 at 18:05
  • I was looking for alternatives where maybe I could somehow in a single command check if table has any records. Anyhow, thanks for your time. – Ani Menon Mar 06 '18 at 19:02
  • We are using Avro and even in case of no data, it writes the header to the file. So option 1 would not be very helpful. – pecey Jan 25 '19 at 11:18