2

I'm working on a tinder like app. In order to exclude profiles that user has swiped before, I use a "must_not" query like this:

must_not : [{"terms": { "swipedusers": ["userid1", "userid1", "userid1"…]}}]

I wonder what are the limits using this approach? is this a scalable approach that would also work when the swipedusers array contains 2000 user ids? If there is a better scalable approach to this I would be happy to know...

florian norbert bepunkt
  • 2,099
  • 1
  • 21
  • 32
  • Possible duplicate of [Max limit on the number of values I can specify in the ids filter or generally query clause?](http://stackoverflow.com/questions/26642369/max-limit-on-the-number-of-values-i-can-specify-in-the-ids-filter-or-generally-q) – ChintanShah25 Oct 29 '16 at 19:20
  • question mentionied is about a hard limit enforced by elasticsearch. my question is regarding scalability and good practice. – florian norbert bepunkt Oct 30 '16 at 09:22

1 Answers1

4

there is a better approach! and it called "terms lookup", is something like the traditional join that you could do on relational databases...

I could try to explain you here, but, all the information that you need is well documented on the official Elastic Search page:

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/query-dsl-terms-query.html#query-dsl-terms-lookup

The final solution is having 2 indices, one for the registered users and another one to track swipes for each user. Then, for each swipe, you should update the document containing current user swipes... Here you will need to add elements to an array, and this is another problem in ElasticSearch (big problem if you are using AWS managed ElasticSearch) that only can be solved using scripting... More info at https://www.elastic.co/guide/en/elasticsearch/guide/current/partial-updates.html#_using_scripts_to_make_partial_updates

For your case, the query will result in something like:

GET /possible_matches/_search
{
    "query" : {
        "terms" : {
            "user" : {
                "index" : "swiped",
                "type" : "users",
                "id" : "current-user-id",
                "path" : "swipedUserId"
            }
        }
    }
}

Another thing that you should take in account is the replication configuration for the swipes index, since each node will perform "joins" with that index, is highly recommended to have a full copy of that index in each node. You could achieve this creating the index with the "auto_expand_replicas" with "0-all" value.

PUT /swipes
{
    "settings": {
        "auto_expand_replicas": "0-all"
    }
}
  • wow, thanks a lot. This works nice. Although this approach introduces one problem… when user A queries for users (to swipe them) i want to score users that already liked user B higher, since the chance that there will be a match are obviously higher. however with terms lookup I need to specify a fixed id. is there a way to have this id dynamic like it's always the id of the currently queried record. so for example if first hit result is user c, that the terms lookup checks for user c's swipe record? – florian norbert bepunkt Oct 30 '16 at 09:15
  • Hi, thanks I found this useful too. Why do you say big problem using AWS when updating? – Tom Jul 21 '22 at 21:46