2

I am trying to achieve in-place update for documents.

Solr Version - 5.5.2

Schema.xml -

<dynamicField name="store_*" type="int" indexed="false" stored="false" docValues="true"/>
<field name="_version_" type="long" indexed="false" stored="false" docValues="true" multiValued="false"/>

solrconfig.xml -

<updateHandler class="solr.DirectUpdateHandler2">
  <updateLog>
    <str name="dir">${solr.ulog.dir:}</str>
    <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
  </updateLog>
</updateHandler>`

UpdateHandler being used - DirectUpdateHandler2

According to this article, the target field is non-indexed (indexed="false"), non-stored (stored="false"), single valued (multiValued="false") numeric docValues (docValues="true") field.

I am only adding the document using updateHandler.addDoc(addUpdateCommand); and NOT performing commit after the addition of document using - solrClient.commit();

Issue is without commit, the document is not reflecting.

If I used autoSoftCommit and only adds the document, the changes are reflected in index but filterCache is being cleared.

My aim to achieve in-place update without clearing the filterCache.

Can this be achieved?

Community
  • 1
  • 1
  • The problem is that the filter cache may contain documents that no longer should be in the cached results after your commit. There's a possible hack by [using a segmented filter cache](http://blog-archive.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html) - but "visibility has a cost" is a good assumption. Use commitWithin to reduce the number of commits instead if possible. – MatsLindh Aug 07 '18 at 10:10

1 Answers1

4

Short answer: no, you can't both index a document (a partial or in-place update is still an indexing) and have it searchable (or the changes visible) without clearing Solr's caches.

Long answer: You can index documents and have the caches stay populated (openSearcher=false), but the newly indexed documents will not appear in search results unless you perform a hard or soft commit. To understand why you should understand how Solr/Lucene works:

  1. A Lucene index is represented as a set of segments. Also, each segment is an auto contained index on its own with multiple files per segment. Finally, once writen to disk, segments are mostly immutable.

  2. Each Solr core has a single instance of IndexSearcher to perform the queries. The IndexSearcher has a static view of all the segments that existed when it was created. This view doesn't change for the lifetime of the IndexSearcher and the caches belong to the IndexSearcher.

  3. Whenever you issue a commit a new segment is created. This operation creates a new IndexSearcher to reflect the newly added (or updated) documents. While the new IndexSearcher is being initialised, the old one is still processing requests. Once the new IndexSearcher finishes, the old one if unregistered (destroyed) and the new IndexSearcher starts to serve the query requests.

So, the filterCache is cleared because it pertains to a new IndexSearcher. But you can use autoWarming: pre-populate the new caches with values from the old cache (see autowarmCount in solrconfig.xml). Take care because warming can impact performance -- basically the new IndexSearcher will re-run a percentage (configurable) of the filter queries using the keys (queries) from the old IndexSearcher cache -- as the IndexSearcher is not ready until the warming finishes.

See: https://wiki.apache.org/solr/SolrCaching#autowarmCount

PS: it's usually not advisable to issue a commit for each new document/update due to the reasons above. It's preferable to rely on auto hard and soft commits.

eribeiro
  • 572
  • 3
  • 7
  • Nitpick: "It's preferable to rely on hard and soft commits" - I think you mean commitWithin (or autoCommit with maxTime/maxDocs), since any commit, regardless of when issued, will be a hard or soft commit? – MatsLindh Aug 12 '18 at 17:46
  • Yup, I mean and with maxDocs and/or maxTime. I'd setup a hard commit with openSearcher=false plus a soft commit in solrconfig.xml. IMO, even though commitWithin is a soft commit, it can lead to same problems of explicit hard commit (e.g. trashing the caches) if the interval is not much large, right? An explicit commit ( /update?commit=true ) will trigger a hard commit, afaik. – eribeiro Aug 12 '18 at 20:42
  • The difference is the default setting - you can make `commitWithin` be a hard commit, and you can make `commit=true` be a soft commit if `softCommit=true` is added to the arguments. In this case they'll be the same, since the caches will be expired (as the new searcher is opened). – MatsLindh Aug 13 '18 at 07:54