Elasticsearch scoring documents liked by similar users higher

Question

In Elasticsearch I have two indexes, places and users. This is the mapping for places:

mappings: {
  location: {
    type: "geo_point"
  }
}

And this is the mapping for users:

mappings: {
  likes: {
    type: "keyword"
  },
  seen: {
    type: "keyword"
  }
}

As you can see a user can like and see different places. Now I want to query places which a user has not seen or liked yet and want to show places which are liked by users who like similar places as the querying user first. This is the query I was able to come up with:

POST /places/_search
{
  "_source": [
    "id"
  ],
  "size": 1,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must_not": [
            {
              "terms": {
                "_id": {
                  "index": "users",
                  "id": "vu0E1rjJEqcgyfj29fwZ",
                  "path": "seen"
                }
              }
            },
            {
              "terms": {
                "_id": {
                  "index": "users",
                  "id": "vu0E1rjJEqcgyfj29fwZ",
                  "path": "likes"
                }
              }
            }
          ],
          "filter": {
            "geo_distance": {
              "distance": "200km",
              "location": {
                "lat": 52,
                "lon": 13
              }
            }
          }
        }
      },
      "random_score": {},
      "boost_mode": "replace"
    }
  }
}

However, at this moment this query just assigns a random score to all results. As I'm new to Elasticsearch I'm struggling to come up with a scoring function to achieve scoring places, that similar users have liked, higher, especially because the data about user likes is stored in a different index than the one I'm actually querying. What would be the best approach this problem? Is something like this even possible with my current data model?

Ashraful Islam · Answer 1 · 2020-04-23T05:00:30.483

I think you have to perform two request like below

Get all the similar user's likes location ids
Then use the location ids to match and exclude the likes and seen location

Step 1 query example :

GET users/_search
{
  "_source": [
    "likes"
  ],
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "likes": {
              "index": "users",
              "id": "vu0E1rjJEqcgyfj29fwZ",
              "path": "likes"
            }
          }
        }
      ],
      "must_not": [
        {
          "ids": {
            "values": [
              vu0E1rjJEqcgyfj29fwZ
            ]
          }
        }
      ]
    }
  }
}

Step 2 query example :

GET places/_search
{
  "_source": [
    "id"
  ],
  "size": 1,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "ids": {
                "values": [] # Put all the similar user like ids here
              }
            }
          ],
          "must_not": [
            {
              "terms": {
                "_id": {
                  "index": "users",
                  "id": "vu0E1rjJEqcgyfj29fwZ",
                  "path": "seen"
                }
              }
            },
            {
              "terms": {
                "_id": {
                  "index": "users",
                  "id": "vu0E1rjJEqcgyfj29fwZ",
                  "path": "likes"
                }
              }
            }
          ],
          "filter": {
            "geo_distance": {
              "distance": "200km",
              "location": {
                "lat": 52,
                "lon": 13
              }
            }
          }
        }
      },
      "random_score": {},
      "boost_mode": "replace"
    }
  }
}

score -1 · Answer 2 · answered Apr 21 '20 at 14:41

You could use a gauss decay function from within your function score query, as nicely described here:

GET /places/_search
{
  "size": 5,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must_not": [
            {
              "terms": {
                "_id": {
                  "index": "users",
                  "type": "_doc",
                  "id": "vu0E1rjJEqcgyfj29fwZ",
                  "path": "seen"
                }
              }
            },
            {
              "terms": {
                "_id": {
                  "index": "users",
                  "type": "_doc",
                  "id": "vu0E1rjJEqcgyfj29fwZ",
                  "path": "likes"
                }
              }
            }
          ]
        }
      },
      "functions": [
        {
          "gauss": {
            "location": {
              "origin": {
                "lat": 52,
                "lon": 13
              },
              "scale": "200km"
            }
          }
        }
      ],
      "boost_mode": "replace"
    }
  }
}

But I wonder what the current connection between the likes and places is in your data model.

I'm sorry but this answer misses the point of my question, as your answer is focused on distance-based scoring while I want to achieve scoring based on users interests/likes — user3517658, Apr 21 '20 at 15:47

Elasticsearch scoring documents liked by similar users higher

2 Answers2