1

I am using a second database with datasets within my API.

Every API request can have up to 3 queries on that Database so I am splitting them in three Threads. To keep it Thread safe I am using a connection pool.

But after the whole code is run the ConnectionPool thread is not terminated. So basically every time a request is made, we will have a new Thread on the server until basically there is no memory left.

Is there a way to close the connection pool thread? Or am I doing wrong on creating a connection pool per request?

I setup the Connection Pool this way:

begin
  full_db = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)
  resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new(full_db)
  spec = resolver.spec(Rails.env.to_sym)
  pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)

Then I am running through the queries array and getting the results to the query

returned_responses = []
queries_array.each do |query|
  threads << Thread.new do
    pool.with_connection do |conn|
     returned_responses << conn.execute(query).to_a
    end
  end
end

threads.map(&:join)

returned_responses

Finally I close the connections inside the connection pool:

ensure
 pool.disconnect!
end
  • 1
    I'm assuming that you don't want this database tied to an ActiveRecord model and instead just want to make direct SQL queries against a database? – anothermh Jan 06 '23 at 17:54
  • Yes. I just want to make direct SQL queries. The config has `database_tasks: false`. I can use ActiveRecord to connect but I'm not going to use its syntax at all. – Miguel Torres Jan 07 '23 at 00:10
  • I suggest looking into [`ActiveRecord::Relation.load_async`](https://www.rubydoc.info/github/rails/rails/ActiveRecord%2FRelation:load_async) ([blog article about it](https://pawelurbanek.com/rails-load-async)). `load_async` is build in into Rails 7 and it seems like it does exactly what you try to do manually with splitting up the queries into different threads. – spickermann Jan 07 '23 at 08:01
  • Yes. But unfortunately he haven't yet upgraded to Rails 7. – Miguel Torres Jan 10 '23 at 15:32

3 Answers3

1

Since you want to make SQL queries directly without taking advantage of ActiveRecord as the ORM, but you do want to take advantage of ActiveRecord connection pooling, I suggest you create a new abstract class like ApplicationRecord:

# app/models/full_datasets.rb

class FullDatasets < ActiveRecord::Base
  self.abstract_class = true

  connects_to database: {
    writing: :full_datasets_database,
    reading: :full_datasets_database
  }
end

You'll need to configure the database full_datasets_database in database.yml so that connects_to is able to connect to it.

Then you'll be able to connect directly to that database and make direct SQL queries against it by referencing that class instead of ActiveRecord::Base:

FullDatasets.connection.execute(query)

The connection pooling will happen transparently with different pools:

FullDatasets.connection_pool.object_id
=> 22620

ActiveRecord::Base.connection_pool.object_id
=> 9000

You may have to do additional configuration, like dumping the schema to db/full_datasets_schema.rb, but any additional troubleshooting or configuration you'll have to do will be in described in https://guides.rubyonrails.org/active_record_multiple_databases.html.

The short version of this explanation is that you should attempt to take advantage of ActiveRecord as much as possible so that your implementation is clean and straightforward while still allowing you to drop directly to raw SQL.

anothermh
  • 9,815
  • 3
  • 33
  • 52
  • This looks a good idea but when I run `FullDatasets.connection.execute(query)` inside `Thread.new` block it takes ages. Not sure why. I tried to pull ``FullDatasets.connection` out of that block but it then a request takes 9 seconds instead of the previous 2, so I don't think it is running concurrently. – Miguel Torres Jan 10 '23 at 12:32
  • @MiguelTorres Are you running `FullDatasets.establish_connection` before you start threading? Can you post an update in your question with more detail on what you're doing now that you have this setup? – anothermh Jan 10 '23 at 17:08
  • The solution wasn't entirely this one but this helped me a lot to find it. I will add the solution – Miguel Torres Jan 16 '23 at 15:46
1

After some time spent, I ended up finding an answer. The generic idea came from @anothermg but I had to do some changes in order to work in my version of rails (5.2).

I setup the database in config/full_datasets_database.yml I had the following initializer already:

#! config/initializers/db_full_datasets.rb
DB_FULL_DATASETS = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)[Rails.env]

I created the following model to create a connection to the new database:

#! app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
  self.abstract_class = true

  establish_connection DB_FULL_DATASETS
end

On the actual module I added the following code:

  def parallel_queries(queries_array)
    returned_responses = []
    threads = []

    conn = FullDatasets.connection_pool

    queries_array.each do |query|
      threads << Thread.new do
        returned_responses << conn.with_connection { |c| c.execute(query).to_a }
      end
    end

    threads.map(&:join)

    returned_responses
  end
0

Follow the official way of handling multiple databases in Rails:

https://guides.rubyonrails.org/active_record_multiple_databases.html

I can't give you an accurate answer as I do not have your source code to fully understand the whole context. If the setup that I sent above is not applicable to your use case, you might have missed some background clean up tasks. You can refer to this doc:

https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html

Rein Avila
  • 375
  • 1
  • 6