2

I am trying to update the entries in a Solr database using the pysolr add() and commit() methods. I have a massive database and I need to figure out a way to change every entry one at a time. I know I can just query the whole database and save it as a list, but that requires a ton of memory. So I'm wondering if anyone knows of a built-in functionality that will allow me to read the entries one at a time without saving the whole database in memory.

  • Can you share also a fragment of the code that you try? – Cyberguille Jun 10 '20 at 16:08
  • 1
    Use cursor marks - they're stateless pointers into the resultset which avoids most of the issues with deep paging (especially across multiple nodes). You'll be able to get a subset of documents for each request and then make your updates. However, be careful that you're not changing the sort of the result set when making those updates (i.e. moving the updated documents to the end of the result set), since the cursor mark points "into" your sorted result set. Sorting by an updated timestamp desc should work fine. – MatsLindh Jun 10 '20 at 18:51
  • https://lucene.apache.org/solr/guide/8_5/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors – MatsLindh Jun 10 '20 at 18:52
  • If you know all the IDs in your Solr core, you could probably use the RealTime API https://lucene.apache.org/solr/guide/8_5/realtime-get.html – Hector Correa Jun 11 '20 at 19:51

0 Answers0