2

There are huge no of records in a large hbase transactional table.

From Hbase shell:

  • How to get a sample record which was inserted/ updated in last 6 hours?

  • Is it possible to get the count of inserted/ updated records in last 6 hours?

Vijay Innamuri
  • 4,242
  • 7
  • 42
  • 67

1 Answers1

3
  • How to get a sample record which was inserted/ updated in last 6 hours?

    Following query gets a sample records from hbase table which was inserted/ updated in last 6 hours.

    scan 'my.table', { LIMIT =>1, TIMERANGE => [(Time.now.to_f.round()*1000).to_i-21600000, (Time.now.to_f.round()*1000).to_i]}

  • Is it possible to get the count of inserted/ updated records in last 6 hours?

Based on the SO answer: Count number of records in a column family in an HBase table

# count_table 'test.table', { CACHE => 1000 }
# --- Count rows with caching.
#
def count_table(tablename, args = {})

    table = @shell.hbase_table(tablename)

    # Run the scanner
    scanner = table._get_scanner(args)

    count = 0
    iter = scanner.iterator

    # Iterate results
    while iter.hasNext
        row = iter.next
        count += 1
    end

    # Return the counter
    return count
end

Query is:

count_table 'my.table', { TIMERANGE => [(Time.now.to_f.round()*1000).to_i-21600000, (Time.now.to_f.round()*1000).to_i] , CACHE => 10000000}

Above query gets the count of inserted/ updated records in last 6 hours.

It returns the desired result but haven't yet tested w.r.t performance load.

Note: Though I have answered my question I'm still keeping this thread open to get the better answers from others.

Community
  • 1
  • 1
Vijay Innamuri
  • 4,242
  • 7
  • 42
  • 67