13

I have a Spring Boot (v2.0.8) application which makes use of a HikariCP (v2.7.9) Pool (connecting to MariaDB) configured with:

minimumIdle: 1
maximumPoolSize: 10
leakDetectionThreshold: 30000

The issue is that our production component, once every few weeks, is repeatedly throwing SQLTransientConnectionException " Connection is not available, request timed out after 30000ms...". The issue is that it never recovers from this and consistently throws the exception. A restart of the componnent is therefore required.

From looking at the HikariPool source code, it would seem that this is happening because every time it is calling connectionBag.borrow(timeout, MILLISECONDS) the poolEntry is null and hence throws the timeout Exception. For it to be null, the connection pool must have no free entries i.e. all PoolEntry in the sharedList are marked IN_USE.

I am not sure why the component would not recover from this since eventually I would expect a PoolEntry to be marked NOT_IN_USE and this would break the repeated Exceptions.

Possible scenarios I can think of:

  1. All entries are IN_USE and the DB goes down temporarily. I would expect Exceptions to be thrown for the in-flight queries. Perhaps at this point the PoolEntry status is never reset and therefore is stuck at IN_USE. In this case I would have thought if an Exception is thrown the status is changed so that the connection can cleared from the pool. Can anyone confirm if this is the case?

  2. A flood of REST requests are made to the component which in turn require DB queries to be executed. This fills the connection pool and therefore subsequent requests timeout waiting for previous requests to complete. This makes sense however I would expect the component to recover once the requests complete, which it is not.

Does anyone have an idea of what might be the issue here? I have tried configuring the various timeouts that are in the Hikari documentation but have had no luck diagnosing / resolving this issue. Any help would be appreciated.

Thanks!

sam
  • 2,469
  • 8
  • 37
  • 57

1 Answers1

8

Scenario 2 is most likely what is happening. I ran into the same issue when using it with cloud dataflow and receiving a large amount of connection requests. The only solution I found was to play with the config to find a combination that worked for my use case.

I'll leave you my code that works for 50-100 requests per second and wish you luck.

private static DataSource pool;
final HikariConfig config = new HikariConfig();
config.setMinimumIdle(5);
config.setMaximumPoolSize(50);
config.setConnectionTimeout(10000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
config.setJdbcUrl(JDBC_URL);
config.setUsername(JDBC_USER);
config.setPassword(JDBC_PASS);

pool = new HikariDataSource(config);
R Wri
  • 272
  • 1
  • 17
  • 2
    Thanks, that was a good pointer to avoid the timeouts .. Minor comment though: the idle timeout and max lifetime values seem to be the default values per https://github.com/brettwooldridge/HikariCP#configuration-knobs-baby – evandongen May 14 '20 at 13:09
  • 3
    but this question is still unanswered that why user need to restart component? system should start processing incoming request once few connections are free. – Abhishek Singh Jul 15 '21 at 08:10
  • It seems that I encountered the same error today. And I also think that scenario 2 is the root cause. However, I still struggling with hikaricp config now. – hieunt89 Mar 05 '22 at 06:33
  • 1
    In the scenario experienced by sam, I would definitely try to set MaxLifetime to a smaller value. This is not a solution, but could result in kind of a slow self-healing: hikari should destroy connections (stuck IN_USE) after this many time and create new, working connections in the pool (not tested, though). I might play with IdleTimeout as well (set smaller values). – booFar Jun 28 '22 at 11:29