3

I want to know how many records processed or % of records proccessed by a query to fetch result in hive.

I tried describe formatted for query, but unable to do.

describe formatted (select * from sample)

leftjoin
  • 36,950
  • 8
  • 57
  • 116
kiran
  • 445
  • 1
  • 5
  • 11

1 Answers1

1

Use explain command:

explain extended select * from sample

But the number of rows in the plan is taken from statistics because query was not actually executed yet. The number of processed rows will become known only after execution.

See manual here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

Counters in the log after command finished look like this:

Counters=FileSystemCounters.FILE_BYTES_READ:165364556525,
FileSystemCounters.FILE_BYTES_WRITTEN:398475913171,
FileSystemCounters.FILE_READ_OPS:0,
FileSystemCounters.FILE_LARGE_READ_OPS:0,
FileSystemCounters.FILE_WRITE_OPS:0,
FileSystemCounters.HDFS_BYTES_READ:2403609087417,
FileSystemCounters.HDFS_BYTES_WRITTEN:2401487507859,
FileSystemCounters.HDFS_READ_OPS:185667,
FileSystemCounters.HDFS_LARGE_READ_OPS:0 HIVE.RECORDS_IN:204428194,
HIVE.RECORDS_OUT_0:63070586,
HIVE.RECORDS_OUT_1_schema.table_name:39980068,
HIVE.RECORDS_OUT_INTERMEDIATE:126141195,
HIVE.SKEWJOINFOLLOWUPJOBS:0,
Shuffle Errors.BAD_ID:0,Shuffle 
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks for reply, can I get that count anywhere in logs after query executions – kiran Nov 28 '17 at 07:39
  • @kiran If you are doing select, it says: `Time taken: 0.302 seconds, Fetched: 1 row(s)` at the end. If you are inserting table, says `Loading data to table sample Table sample stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]` Try to switch on statistics gathering: `set hive.stats.autogather=true;` – leftjoin Nov 28 '17 at 07:49
  • I am looking for total row query processed to fetch that 1 row , In the below link i can see 2 different fields like 'MAP_INPUT_RECORD' and 'MAP_OUTPUT_RECORDS'. Here i guess above 1 refers --MAP_OUTPUT_RECORDS. How can i get input records one https://acadgild.com/blog/counters-in-mapreduce/ – kiran Nov 28 '17 at 08:30
  • @kiran AFAIK Counters are present in the job execution log and it is possible to parse them. It is not possible to get access to counters from Hive. – leftjoin Nov 29 '17 at 18:12
  • 1
    @kiran Counters in the logs look like: 'Counters=FileSystemCounters.FILE_BYTES_READ:165364556525,FileSystemCounters.FILE_BYTES_WRITTEN:398475913171,FileSystemCounters.FILE_READ_OPS:0,FileSystemCounters.FILE_LARGE_READ_OPS:0,FileSystemCounters.FILE_WRITE_OPS:0,FileSystemCounters.HDFS_BYTES_READ:2403609087417,FileSystemCounters.HDFS_BYTES_WRITTEN:2401487507859,FileSystemCounters.HDFS_READ_OPS:185667,FileSystemCounters.HDFS_LARGE_READ_OPS:0' and so on. – leftjoin Nov 30 '17 at 07:17
  • Thanks leftjoin, somewhat like this i am looking. Here the logs are referring is hive logs? – kiran Nov 30 '17 at 17:56