0

I've a function photos-with-keyword-starting that gets lists of photos for a given keyword from a MongoDB instance using monger, and another that finds subsets of these photos using set/intersection.

(defn photos-with-keywords-starting [stems]
  (apply set/intersection
         (map set
              (map photos-with-keyword-starting stems))))

Previously I thought this worked fine, but since adding more records the intersection doesn't work as expected -- it misses lots of records that have both keywords.

I notice that calls to the function photos-with-keyword-starting always return a maximum of 256 results:

=> (count (photos-with-keyword-starting "lisa"))
256

Here's the code of that function:

(defn photos-with-keyword-starting [stem]
  (with-db (q/find {:keywords {$regex (str "^" stem)}})
    (q/sort {:datetime 1})))

So because calls to find records in MongoDB don't return all records if there are more than 256, I don't get the right subsets when specifying more than one keyword.

How do I increase this limit?

Eric Clack
  • 1,886
  • 1
  • 15
  • 28
  • unless it is not essential for you too store the date as the joda Date in the map, you can convert it with `bean` function: `(update data :datetime bean)`. Otherwise you could use set with custom equality: https://clojuredocs.org/clojure.core/sorted-set-by – leetwinski Apr 02 '18 at 12:56
  • Can you provide a complete example that reproduces this behavior? I can't come up with one but I suspect it could be related to deserializing the dates. – Taylor Wood Apr 02 '18 at 13:00
  • Updated with a working example, which makes me think something else is causing the error, not set intersection... – Eric Clack Apr 02 '18 at 15:51
  • 1
    It's a duplicate of this question: https://stackoverflow.com/questions/38648102/mongodb-query-has-implicit-limit256 – Eric Clack Apr 02 '18 at 20:05

1 Answers1

0

You could simply convert the datetime in your function photos-with-keyword-starting to for instance a string, if you can live with that.

Alternatively you could remove logical duplicates from your output, for instance like this:

(->> 
  -your-result-  
  (group-by #(update % :datetime str)) 
  (map (comp first val)))
clojureman
  • 421
  • 3
  • 4
  • 1
    Thanks for the answer. However it turns out that the DateTime variations were not the cause of the error, it was actually a monger default query limit. I've rewritten the question accordingly. – Eric Clack Apr 02 '18 at 21:55