Elasticsearch sort based on the number of occurrences a string appears in an array

Question

I have an array field containig a list of strings: ie.: ["NY", "CA"]

At search time I have a filter which matches any of the strings in the array.

I would like to sort the results based on documents that have the most number of appearances of the searched string: "NY"

Results should include: document 1: ["CA", "NY", "NY"] document 2: ["NY", FL"] document 3: ["NY", CA", "NY", "NY"]

Results should be ordered as such

User 3, User 1, User 2

Is this possible? If so, how?

I have this problem right now, and I think in practice, it will sort based on term frequencies IF other documents have "CA" but not NY. — Henley, Sep 14 '13 at 15:26

score 1 · Accepted Answer · answered Mar 12 '13 at 22:04

For those curious, I was not able to boost based on how many occurrences of the word happen in the array. I did however accomplished what I needed with the following:

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"states_ties":["CA"],"state_abbreviation":"CA","worked_in_states":["CA"],"training_in_states":["CA"]}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"states_ties":["CA","NY"],"state_abbreviation":"FL","worked_in_states":["NY","CA"],"training_in_states":["NY","CA"]}'
curl -X POST "http://localhost:9200/index/document/3" -d '{"id":3,"states_ties":["CA","NY","FL"],"state_abbreviation":"NY","worked_in_states":["NY","CA"],"training_in_states":["NY","FL"]}'

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
  "query": {
    "custom_filters_score": {
      "query": {
        "terms": {
          "states_ties": [
            "CA"
          ]
        }
      },
      "filters": [
        {
          "filter": {
            "term": {
              "state_abbreviation": "CA"
            }
          },
          "boost": 1.03
        },
        {
          "filter": {
            "terms": {
              "worked_in_states": [
                "CA"
              ]
            }
          },
          "boost": 1.02
        },
        {
          "filter": {
            "terms": {
              "training_in_states": [
                "CA"
              ]
            }
          },
          "boost": 1.01
        }
      ],
      "score_mode": "multiply"
    }
  },
  "sort": [
    {
      "_score": "desc"
    }
  ]
}'

results: id: score

1: 0.75584483
2: 0.73383
3: 0.7265643

score 0 · Answer 2 · answered Mar 11 '13 at 16:19

0

This would be accomplished by the standard Lucene scoring implementation. If you were simply searching for "NY", without specifying an order, it will sort by relevance, and will assign highest relevance to a document with more occurances of the term, all else being equal.

answered Mar 11 '13 at 16:19

femtoRgon

32,893
7
60
87

Not for a filter query, I have added supporting code to the question. – brupm Mar 11 '13 at 21:13
1

Ah, I see. I don't believe you can do that though. Filtering does what it says, it filters. Either a doc gets through the filter or it doesn't. It simply restricts the result set. I don't believe there is any concept allowing you to determine that doc1 passes a filter better than doc2. I would suggest that using a filter is the wrong way to approach your problem. – femtoRgon Mar 11 '13 at 21:34
https://gist.github.com/brupm/5138787 here's the supporting code. But I believe femtoRgon is correct. – brupm Mar 11 '13 at 23:09
Also, even when using query_string searches, the score only seems to calculate properly if I search across the entire doc: https://gist.github.com/brupm/5138842 – brupm Mar 11 '13 at 23:18

Elasticsearch sort based on the number of occurrences a string appears in an array

2 Answers2

Linked