1

I am using 'chewy' gem for elasticsearch in my ROR application. But I didn't find any documentation for elasticsearch scroll api. I'm getting below error when I jump to last page of the records.

[500] {"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Result window is too
large, from + size must be less than or equal to: [10000] but was [19450]. See the scroll api for a more
efficient way to request large data sets. This limit can be set by changing the [index.max_result_window]
index level parameter."}],"type":"search_phase_execution_exception","reason":"all shards failed",
"phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"recordings","node":"tgLqH_wwRUG6NmY0PCB0nA",
"reason":{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must
 be less than or equal to: [10000] but was [19450]. See the scroll api for a more efficient way to request
 large data sets. This limit can be set by changing the [index.max_result_window] index level
 parameter."}}]},"status":500}

Is there any way to implement elasticsearch scroll api in chewy gem or is their any other option?

vitthal-gaikwad
  • 1,184
  • 11
  • 13
  • It looks like as of right now, Aug 17, 2016, Chewy has an open bug for specifically using Chewy with the scroll API: https://github.com/toptal/chewy/issues/327 – Michael Wasser Aug 17 '16 at 15:48

1 Answers1

0

Just make the query size smaller and you can use scroll in batches:

  # @example Call the `scroll` API until all the documents are returned
  #
  #     # Index 1,000 documents
  #     client.indices.delete index: 'test'
  #     1_000.times do |i| client.index index: 'test', type: 'test', id: i+1, body: {title: "Test #{i}"} end
  #     client.indices.refresh index: 'test'
  #
  #     # Open the "view" of the index by passing the `scroll` parameter
  #     # Sorting by `_doc` makes the operations faster
  #     r = client.search index: 'test', scroll: '1m', 
              body: {size: 100, sort: ['_doc']}
  #
  #     # Display the initial results
  #     puts "--- BATCH 0 -------------------------------------------------"
  #     puts r['hits']['hits'].map { |d| d['_source']['title'] }.inspect
  #
  #     # Call the `scroll` API until empty results are returned
  #     while r = client.scroll(scroll_id: r['_scroll_id'], scroll: '5m') and not r['hits']['hits'].empty? do
  #       puts "--- BATCH #{defined?($i) ? $i += 1 : $i = 1} -------------------------------------------------"
  #       puts r['hits']['hits'].map { |d| d['_source']['title'] }.inspect
  #       puts
  #     end

Example taken from here using the Elasticsearch DSL Gem

mrmbr007
  • 112
  • 1
  • 6