I am trying to use HBase for building some real time API's. Hence my use case is to support ~10000 concurrent requests per second. I am trying to do some connection pooling so as to achieve multi thread access. I followed this documentation to create the connection: https://hbase.apache.org/1.1/apidocs/org/apache/hadoop/hbase/client/ConnectionFactory.html
But I keep getting this error when I make concurrent requests to my API:
WARN [http-nio-34000-exec-93-SendThread(d-3zjyk02.target.com:2181)]
19 Apr 2017 04:48:13:872 (ClientCnxn.java:1102) - Session 0x0 for
server d-3zjyk02.target.com/10.66.241.30:2181, unexpected error,
closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Here is how I am creating the connection:
// Connection to the cluster. A single connection shared by all application threads
private Connection connection = null;
public Connection getHBaseConnection() throws Exception {
if (connection == null) {
try {
Configuration configuration = HBaseConfiguration.create();
configuration.addResource("core-site.xml");
configuration.addResource("hbase-site.xml");
configuration.addResource("hdfs-site.xml");
connection = ConnectionFactory.createConnection(configuration);
} catch (Exception ex) {
LOG.error("Exception in creating the HBase connection object: " + ex.getMessage());
throw new Exception("Exception in creating the HBase connection: " + ex.getMessage());
}
}
return connection;
}
And here is how I use the get HBase connection method to some scan operations:
try {
connection = getHBaseConnection();
afterConnectionStartTime = System.currentTimeMillis();
LOG.info("[" + (System.currentTimeMillis() - startTime) + "]ms" + " ...TIME TAKEN to get the HBase connection object");
if (connection != null) {
table = connection.getTable(TableName.valueOf(TABLE_NAME));
Scan scan = new Scan(Bytes.toBytes(rowKeyStartDate), Bytes.toBytes(rowKeyEndDate));
scan.addColumn(COLUMN_FAMILY, ITEM);
}
This code works fine for any number of sequential requests, but when I do concurrent requests, I keep getting this error.
Some of the observations from my research on this issue:
1) This error is related to zookeeper closing the socket after certain number of requests (which I assume when it exceeds the max client connections (40) mentioned in my zoo.cfg file). But what I don't understand is why the concurrent requests are going to zookeeper in the first place. The first request should open the connection object and all the subsequent requests should use that pre existing connection to directly talk to region servers.
2) I am assuming this is the right way to do the connection pooling (at least as per the official Hbase doc). If no, whats the right way to do it?
3) I don't want to increase the max client connections in the zookeeper cfg file, thought it might be a temporary hack that can do my work.
Any help / suggestions is much appreciated.
Thanks!