Implementing change in ThreadPoolSize on client side - JDBC driver Apache Phoenix

Question

I have recently set up a JDBC driver to connect to Hadoop db using Apache Phoenix. Basic queries on Squirrel have worked well (for example, "select * from datafile"), but as soon as I ask a slightly more complicated query (ie, "select column1 from datafile where column2 = 'filter1'", I encounter this error:

org.apache.phoenix.exception.PhoenixIOException: Task 
org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask rejected from 
org.apache.phoenix.job.JobManager[Running, pool size = 128, active threads = 
128, queued tasks = 5000, completed tasks = 5132]

From some searching, it seems that I should increase the ThreadPoolSize in the Apache Phoenix hbase.xml configuration file in order to avoid this error, which I have done, increasing it from 128 to 512. However, it does not seem to have noticed this change. The error persists and the "pool size" is still given as 128 within the error.

On the Phoenix Driver settings in Squirrel, I have indicated the location of hbase and hdfs directories containing the .xml config files under "Extra Class Path" in setup.

Is there any way to make the driver "notice" that the ThreadPoolSize has changed?

Thank you!

Petro · Answer 1 · 2018-04-12T16:53:44.773

I spent a lot of time on this issue...

The first step is to run an explain on the query and look for the chunks number (ex: CLIENT 4819-CHUNK):

explain select row sum(row2) where the_date=to_date("2018-01-01");

+------------------------------------------------------------------------------+
|                                                PLAN                          |
+------------------------------------------------------------------------------+
| CLIENT 4819-CHUNK 2339029958 ROWS 1707237752908 BYTES PARALLEL 4819-WAY FULL |
|     SERVER FILTER BY "THE_DATE" = DATE '2018-01-01 01:00:00.000'           |
|     SERVER AGGREGATE INTO DISTINCT ROWS BY ["THE_DATE"]                    |
| CLIENT MERGE SORT                                                            |
+------------------------------------------------------------------------------+
4 rows selected (0.247 seconds)

Check the number of regions and/or guideposts in the table
Set the phoenix.stats.guidepost.width property to a value larger than its default size of 100MB and restart HBase Region Servers to apply the change
Update the table statistics by running the following command: jdbc:phoenix...> UPDATE STATISTICS my_table

Set these values in Ambari/hbase config:

phoenix.query.threadPoolSize: Number of concurrent threads to run for each query and should be set to the number of vcores at the client side/Region Servers in the cluster.

phoenix.query.queueSize: The Max queue depth for the tasks to be run for any queue, beyond which an attempt to queue additional work is rejected. Set this property value to be equal to the number of 'chunks' for the table, as it can be seen in the 'explain' command output.

REFERENCE https://phoenix.apache.org/update_statistics.html

score 0 · Answer 2 · answered Jun 22 '17 at 00:31

Couple of things to check

Ensure your phoenix client jar is the compatible version with that of your phoenix server.
Get the hbase-site.xml (ensure phoenix threadpool size is set appropriately in sync with the master) from your Hbase master node and add to the phoenix jar file (using 7zip) and try running the squirrel client again.

Implementing change in ThreadPoolSize on client side - JDBC driver Apache Phoenix

2 Answers2