1

I am working with the Python runtime (if that matters for this). I have been struggling with the number_found_accuracy. The documentation on the api is a bit lacking for this particular field. The name of it seems to suggest one would set it to a number like 200 for instance, meaning "try to be accurate to within 200". However, reading other's accounts and given that this value has a max value (in the api MAXIMUM_NUMBER_FOUND_ACCURACY) of 10,000 currently, this suggests that you set the value for a number over what you expect to get and the search mechanism does it's best with accuracy.

My problem is that I am doing a search that I expect to return around 32,000 results. I set the number_found_accuracy to the max, but the results have a number_found that varies and seems to be way off. For example, returning 90,000. I am using a cursor by the way. Is there a way to find out how many total documents exist in an index independently of number_found? More generally, how does one debug situations like this? And finally, is the number_found property of search results useful if there are over 10k or so results? If not, what are others using on GAE, integrating and calling out to BigQuery?

Thanks for any insight.

tony m
  • 4,769
  • 1
  • 21
  • 28
Jay
  • 525
  • 5
  • 15

1 Answers1

0

on This page, they say it's the minimum accuracy. So giving a bigger number means that it's less accurate. Counting documents can take a long time, so if you have many documents, it can be beneficial to set this number higher, which is why the maximum is so high. But if for example, you want it to be in accurate with a margin of a margin of 100, you enter 100, and it may still report 3000 when there are 3099.

bigblind
  • 12,539
  • 14
  • 68
  • 123
  • 1
    This is not correct. See the number_found section at the bottom of this page. https://developers.google.com/appengine/docs/python/search/searchresultsclass. Number_found is accurate for values <= accuracy, and an estimation otherwise. – Sebastian Kreft Jun 24 '13 at 20:25
  • The way to think about it might be this: If you say number_found_accuracy = 200, then the system will actually count results up to 200, and if there are more than that it uses some kind of guess. So if you have less than or equal to 200 results, you'll get an exact result, more than that is a guess. This means you would naturally want to set a high number; ie: just set it to 10,000! But they caution that this can make queries expensive, which is what you'd expect if you were effectively saying "I only want the first 20 results, but also go count at least 10,000 results". – Emlyn O'Regan May 26 '16 at 04:03