1

I have an index with 100K documents. A user may search the index but must only get as results documents that she has access. The list of documents the user has authorization on is provided by another system and is volatile, so I cannot store this information in the document.

A suggested solution could be to use a filtered query using as filter the documents the user has authorization on. I know that filters are cachable and all, but if the user should have access to eg 50K of the documents it does not seem efficient to include 50K clauses in the filtered query to restrict the result every time.

So the questions are: Should the request size concern me? Is there a more approriate way to achieve the task at hand?

yannisf
  • 6,016
  • 9
  • 39
  • 61
  • 1
    I don't think there should be a problem with [the number of elements to filter after](http://stackoverflow.com/questions/26642369/max-limit-on-the-number-of-values-i-can-specify-in-the-ids-filter-or-generally-q). Regarding the request size, I know that ES has a limit on the request size (`http.max_content_length setting` 100mb maybe?) but I don't think it should be a problem. I'm assuming these 50k documents are represented by IDs or something small. – Andrei Stefan Jun 19 '15 at 09:01
  • What I usually recommend in this kind of situation is a test and a comparison: test with filters, test without filters. Multiple times because the first request will cache different things and the first one is always the one that takes the longest time to return. Test with 1000 filtered docs and increase this gradually going to 10k, 25k, 50k and even 90k. And see how it behaves. My feeling is that there will be a performance penalty, but not a big one and manageable. – Andrei Stefan Jun 19 '15 at 09:05
  • When increasing the number of filters, do keep an eye on the filter cache eviction numbers. I've noticed performance to visibly degrade when filter cache evictions start happening. – bittusarkar Jun 19 '15 at 09:26
  • True that. Also, if you say that list of documents is volatile, depending on how often that changes you might consider not caching that filter at all. It might be more costly to cache (and evict) than not to cache at all. – Andrei Stefan Jun 19 '15 at 09:30

0 Answers0