Avoid DB queries with Chewy

Question

I am using the Elasticsearch library for Rails, Chewy. I am trying to figure out a way to effectively bulk import ~2 million records. These are Elasticsearch documents that are denormalized version of a few different DB models that I have.

I am dividing up the work into batches of size 1000, and offloading them to a worker queue using Sidekiq. The problem is when the import! call is being made there is are a bunch of additional DB queries being made to resolve fields, and I have no idea how to get rid of them.

The naive approach I did was ModelIndex::Type.import [<list of ids>]. This obviously looks up every document from the DB and then serializes/deserializes it. Which is obviously inefficient, so instead I tried doing this

ModelIndex::Type.import Type.includes(:secondary_field, :other_field).all

to try and use eager loading to my advantage and do one DB query instead of 1000 with a join. Alas it still looked up each object in the DB as before.

Maybe I am missing something, but I would like to avoid as many DB queries as possible when indexing to cut down time, any help would be much appreciated!

I'm confused, the `includes` joins tables, but what you have here it looks like you are expecting it to only include those fields, not unlike a select statement. — Ken Stipek, May 27 '16 at 21:33
yeah a `includes` is probably an over kill, im not really concerned about getting specific fields, i just want chewy to use the objects i have already loaded instead of going out and loading them again. — pech0rin, May 27 '16 at 21:41
@pech0rin This is exactly my problem. Did you eventually figure this out? — Mercado, May 07 '21 at 04:56
@Mercado we eventually stopped using Chewy and used the Elasticsearch API directly — pech0rin, May 12 '21 at 02:34

Avoid DB queries with Chewy

0 Answers0