I am using the Elasticsearch library for Rails, Chewy. I am trying to figure out a way to effectively bulk import ~2 million records. These are Elasticsearch documents that are denormalized version of a few different DB models that I have.
I am dividing up the work into batches of size 1000, and offloading them to a worker queue using Sidekiq. The problem is when the import! call is being made there is are a bunch of additional DB queries being made to resolve fields, and I have no idea how to get rid of them.
The naive approach I did was ModelIndex::Type.import [<list of ids>]
. This obviously looks up every document from the DB and then serializes/deserializes it. Which is obviously inefficient, so instead I tried doing this
ModelIndex::Type.import Type.includes(:secondary_field, :other_field).all
to try and use eager loading to my advantage and do one DB query instead of 1000 with a join. Alas it still looked up each object in the DB as before.
Maybe I am missing something, but I would like to avoid as many DB queries as possible when indexing to cut down time, any help would be much appreciated!