Hbase via Thrift connection timeout on Amazon EMR

Question

I am writing to HBase using TThreadPoolServer Thrift server, I have the following HBase settings for max worker threads:

hbase-site.xml

<property>
   <name>hbase.thrift.maxWorkerThreads</name>
   <value>50000</value>
   <source>hbase-site.xml</source>
</property>

This is the script I use to do concurrent writes:

test.py

import happybase
from random import randint

connection = happybase.Connection('ec2-xx-xx-xx-xx.compute-1.amazonaws.com', timeout=50000)
table_name = 'test' + str(randint(0,1000000))

families = {
    'cf1': dict(max_versions=1),
}
connection.create_table(table_name, families)
table = connection.table(name=table_name)
x = 0
while x < 1000000:
    table.put('row-key' + str(x), {b'cf1:qual1': b'testtesttest', b'cf1:qual2': b'testestest'})
    x += 1

Now If I run 25 instances of test.py concurrently, after creating 18-20 connections all the other connections are unable to connect because of timeout error, I checked on hbase server, thrift is able to create only 300 threads and when that limit is reached new connections are not accepted and gets timed out.

There is no stress on the system even with 300 threads, the CPU and memory consumption is very low, Therefore I think it's because of some configuration.

Can somebody guide me on why thrift is not creating more threads, when in my HBase configuration the thrift max thread count is much more?

What's the stack trace of the timeout error you're seeing? You're setting a 50 second timeout on the connection so it's probably that. You said CPU and memory are fine, but what about network? — WattsInABox, Oct 19 '16 at 16:59
Also, if you can, I would skip Thrift and Rest and try to write to and read from HBase directly using code. That will be the fastest way by far and will be the only one that can truly scale to monumental levels. Thrift is going to have its limits. I'm not an expert, but I think 50,000 threads will be pushing it and if you really need that kind of throughput, run code from the local network. I guess the other option would be running many thrift servers. — WattsInABox, Oct 19 '16 at 17:03

Hbase via Thrift connection timeout on Amazon EMR

0 Answers0