0

I am reading data from HBase through spark. The code runs fine when reading the data using a prefix filter with a complete rowkey or using GET, but it freezes if I use a partial prefixed rowkey. The rowkey structure is md5OfAkey_Akey_txDate_someKey. I want to read all data matching “Akeys” from a data frame. The table has a single column family , 50 column qualifiers and has around 200 million records. So when I read using md5OfAkey_Akey_txDate the code gets stuck while if I construct the whole key it runs fine. But I do not want to pass the whole rowkey as I want to read all data for a particular account(Akey) and transaction date (txDate). Any help would be appreciated.

Nikhil Suthar
  • 2,289
  • 1
  • 6
  • 24
  • Performing a scan by partial rowkey (i.e. using PrefixFilter) is expected to be slower than direct `get`. Can you quantify "stuck" or does it never return? – mazaneicha Jan 03 '20 at 19:49
  • sorry for the late reply. I went ahead with the multirowrange filter in hbase and the code runs much faster than the prefix or the fuzzy filter. I am still not sure why the prefix filter was taking more than 10 minutes to get data for a single partial rowkey whereas the multirowrange filter brings the same data in seconds. – Shaggy1755 May 08 '20 at 16:40

0 Answers0