Here is my problem: Each night, I have to process around 50k Background Jobs, each taking an average of 60s. Those jobs are basically calling the Facebook, Instagram and Twitter APIs to collect users' posts and save them in my DB. The jobs are processed by sidekiq.
At first, my setup was:
:concurrency: 5
insidekiq.yml
pool: 5
in mydatabase.yml
RAILS_MAX_THREADS
set to 5 in my Web Server (puma) configuration.
My understanding is:
my web server (
rails s
) will use max 5 threads hence max 5 connections to my DB, which is OK as the connection pool is set to 5.my sidekiq process will use 5 threads (as the concurrency is set to 5), which is also OK as the connection pool is set to 5.
In order to process more jobs in the same time and reducing the global time to process all my jobs, I decided to increase the sidekiq concurrency to 25. In Production, I provisionned a Heroku Postgres Standard Database with a maximum connection of 120, to be sure I will be able to use Sidekiq concurrency.
Thus, now the setup is:
:concurrency: 25
insidekiq.yml
pool: 25
in mydatabase.yml
RAILS_MAX_THREADS
set to 5 in my Web Server (puma) configuration.
I can see that 25 sidekiq workers are working but each Job is taking way more time (sometimes more than 40 minutes instead of 1 minute) !?
Actually, I've been doing some tests and realize that processing 50 of my Jobs with a sidekiq concurrency of 5, 10 or 25 result in the same duration. As if somehow, there was a bottleneck of 5 connections somewhere.
I have checked Sidekiq Documentation and some other posts on SO (sidekiq - Is concurrency > 50 stable?, Scaling sidekiq network archetecture: concurrency vs processes) but I haven't been able to solve my problem.
So I am wondering:
is my understanding of the rails
database.yml
connectionpool
and sidekiqconcurrency
right ?What's the correct way to setup those parameters ?