0

I am using the Elasticsearch library for Rails, Chewy. I am trying to figure out a way to effectively bulk import ~2 million records. These are Elasticsearch documents that are denormalized version of a few different DB models that I have.

I am dividing up the work into batches of size 1000, and offloading them to a worker queue using Sidekiq. The problem is when the import! call is being made there is are a bunch of additional DB queries being made to resolve fields, and I have no idea how to get rid of them.

The naive approach I did was ModelIndex::Type.import [<list of ids>]. This obviously looks up every document from the DB and then serializes/deserializes it. Which is obviously inefficient, so instead I tried doing this

ModelIndex::Type.import Type.includes(:secondary_field, :other_field).all

to try and use eager loading to my advantage and do one DB query instead of 1000 with a join. Alas it still looked up each object in the DB as before.

Maybe I am missing something, but I would like to avoid as many DB queries as possible when indexing to cut down time, any help would be much appreciated!

pech0rin
  • 4,588
  • 3
  • 18
  • 22
  • I'm confused, the `includes` joins tables, but what you have here it looks like you are expecting it to only include those fields, not unlike a select statement. – Ken Stipek May 27 '16 at 21:33
  • yeah a `includes` is probably an over kill, im not really concerned about getting specific fields, i just want chewy to use the objects i have already loaded instead of going out and loading them again. – pech0rin May 27 '16 at 21:41
  • @pech0rin This is exactly my problem. Did you eventually figure this out? – Mercado May 07 '21 at 04:56
  • @Mercado we eventually stopped using Chewy and used the Elasticsearch API directly – pech0rin May 12 '21 at 02:34

0 Answers0