2

Due to some previous questions that I've had answered about the synchronous nature of MySQL I'm starting to question the reason people use Connection pools, and if in my scenario I should move to a pool.

Currently my application keeps a single connection active. There's only a single connection, statement, and result set being used in my application that's recycled. All of my database tasks are placed in a queue and executed back to back on a seperate thread. One thread for database queries, One connection for database access. In the event that the connection has an issue, it will dispose of the connection and create a new one.

From my understanding regardless of how many queries are sent to MySQL to be processed they will all be processed synchornously in the order they are received. It does not matter if these queries are coming from a single source or multiple, they will be executed in the order received.

With this being said, what's the point in having multiple connections and threads to smash queries into the database's processing queue, when regardless it's going to process them one by one anyway. A query is not going to execute until the query before it has completed processing, and like-wise in my scenario where I'm not using a pool, the next query is not going to be executed until the previous query has completed processing.

Now you may say:

The amount of time spent on processing the results provided by the MySQL query will increase the amount of time between queries being executed.

That's obviously correct, which is why I have a worker thread that handles the results of a query. When a query is completed, I convert the results into Map<> format and release the statement/resultset from memory and start processing the next query. The Map<> is sent off to a separate Worker thread for processing, so it doesn't congest the query execution thread.

Can anyone tell me if the way I'm doing things is alright, and if I should take the time to move to a pool of connections rather than a persistent connection. The most important thing is why. I'm starting this thread strictly for informational purposes.

EDIT: 4/29/2016

I would like to add that I know what a connection pool is, however I'm more curious about the benefits of using a pool over a single persistent connection when the table locks out requests from all connections during query processing to begin with.

Hobbyist
  • 15,888
  • 9
  • 46
  • 98
  • 1
    You can perform multiple read operations in parallel, are you interested in pipe-lining hazards RAW,WAR and WAW? Anyways, connection pool keeps a connection open/idle/waiting, opening a new one is a heavy task, so if you have multiple clients, its better to have a pool. – We are Borg Apr 27 '16 at 13:03
  • @RC - Not related, also bad information. – Hobbyist Apr 29 '16 at 13:26

3 Answers3

0

Just trying this StackOverflow thing out but,

In every connection to a database, most of the time, it's idle. When you execute a query in the connection to INSERT or UPDATE a table, it locks the table, preventing concurrent edits. While this is good and all, preventing data overwriting or corruption, this means that no other connections may make edits while the first connection/query is still running.

However, starting a new connection takes time, and in larger infrastructures trying to skim and skin off all excess time wastage, this is not good. As such, connection pools are a whole group of connections left in the idle state, ready for the next query.

Lastly, if you are running a small project, there's usually no reason for a connection pool but if you are running a large site with UPDATEs and INSERTs flying around every millisecond, a connection pool reduces overhead time.

0

Slightly related answer: a pool can do additional "connection health checks" (by examining SQL exception codes) and refresh connections to reduce memory usage (see the note on "maxLifeTime" in the answer). But all those things might not outweigh the simpler approach using one connection.

Another factor to consider is (blocking) network I/O times. Consider this (rough) scenario:

client prepares query --> client sends data over the network
--> server receives data from the network --> server executes query, prepares results
--> server sends data over the network --> client receives data from the network
--> client prepares resultset

If the database is local (on the same machine as the client) then network times are barely noticeable. But if the database is remote, network I/O times can become measurable and impact performance. Assuming the isolation level is at "read committed", running select-statements in parallel could become faster. In my experience, using 4 connections at the same time instead of 1 generally improves performance (or throughput). This does depend on your specific situation: if MySQL is indeed just mostly waiting on locks to get released, adding additional connections will not do much in terms of speed. And likewise, if the client is single-threaded, the client may not actually perceive any noticeable speed improvements.

This should be easy enough to test though: compare execution times for one program with 1 thread using 1 connection to execute an X-amount of select-queries (i.e. re-use your current program) with another program using 4 threads with each thread using 1 separate connection to execute the same X-amount of select-queries divided by the 4 threads (or just run the first program 4 times in parallel).

One note on the connection pool (like HikariCP): the pool must ensure no transaction remains open when a connection is returned to the pool and this could mean a "rollback" is send each time a connection is returned to the pool (closed) when auto-commit is off and no "commit" or "rollback" was send previously. This in turn can increase network I/O times instead of reducing it. So make sure to test with either auto-commit on or make sure to always send a commit or rollback after your query or set of queries is done.

Community
  • 1
  • 1
vanOekel
  • 6,358
  • 1
  • 21
  • 56
  • Thanks for your information. I purchased `Blitz.IO` in order to do some high-range concurrent user stress testing for my API. I'll post my findings using a pool / single connection. – Hobbyist Apr 30 '16 at 14:38
0

Connection pool and persistent connection are not the same thing. One is the limit of the number of SQL connections, the other is single Pipe issues. The problem is generally the time taken to transfer the SQL output to the server than the query execution time. So if you open two cli SQL clients and fire two queries, one with a large output and one with a small output (in that sequence), the smaller one finishes first while the larger one is still scrolling its output. The point here is that multiple connection does solve problems for cases like the above.

When you have multiple front end requests asking for queries, you may prefer persistent connections because it gives you the benefit of multiplex over different connections (large versus small outputs) and prevents the overhead of session setup/teardown.

Connection pool APIs have inbuilt error checks and handling but most APIs still expect you to manually declare if you want a persistent connection or not.

So in effect there are 3 variables, pool, persistence and config parameters via the API. One has to make a mix and match of pool size, persistence and number of connections to suite one's environment

Alok D
  • 49
  • 7