0

So lets say for example I have a 'books' index and each book has an author_id. Because there's only a few authors, author ids will repeat frequently across the books. Books in my index would look something like this:

{
    "title": "Elasticsearch for dummies",
    "author_id": 1,
    "purchases": 10
},
{
    "title": "Great book",
    "author_id": 1,
    "purchases": 5
},
{
    "title": "Great book 2",
    "author_id": 1,
    "purchases": 8
},
{
    "title": "My cool book",
    "author_id": 2,
    "purchases": 14
},
{
    "title": "Interesting book title",
    "author_id": 2,
    "purchases": 20
},
{
    "title": "amazing book",
    "author_id": 2,
    "purchases": 16
},
{
    "title": "Silly Walks vol II",
    "author_id": 3,
    "purchases": 13
},
{
    "title": "Wild animals you can pet",
    "author_id": 3,
    "purchases": 5
},
{
    "title": "GoT Spoilers",
    "author_id": 3,
    "purchases": 4
}

Imagine there are thousands of books and only 50 authors. If I sort only by purchases, I'll get a results page which shows books from only one or two authors. What I need is to have as many authors as possible represented in the results. Is there some combination of function_score + script_score I can use to achieve this? I tried experimenting with Math.exp in a painless script but to no avail.

Dioralop
  • 165
  • 1
  • 9

3 Answers3

1

So I ended up using Field Collapsing which basically allows you to make a regular query and 'collapse' the results based on a particular field. So instead of having each of your results one after the other, you have the top result for each distinct value in that field. You can then use inner_hits to get a list of n posts for each distinct value and you can use from/size to paginate each group.

Dioralop
  • 165
  • 1
  • 9
0

You can use cardinality metric in order to fetch unique count from elasticsearch data.

Below link can help - https://www.elastic.co/guide/en/elasticsearch/guide/master/cardinality.html

  • But that's for use within an aggregation right? I don't want to just grab unique counts, I want to increase the document score based on the author_id, ideally with some kind of exponential decay so only the first few unique authors have the score increase applied. – Dioralop Jun 18 '19 at 08:26
0

You can use terms aggregation to make "group by" results by author_id with combination of tophits aggregation to fetch only few results for each author. So something like this should give a list of authors ordered by book which has max number of purchases where each author has bucket with max 3 books he wrote ordered by purchase count.

aggs: {
  authors: {
    terms: {
      field: 'author_id',
      order: { max_purchases: desc }
    },
    aggs: {
      books: {
       top_hits: {
        size: 3, 
        _source: {include: ['title', 'purchases']},  
        sort: [{purchases: {order: desc  } }] 
       },
       max_purchase : { max : { field : purchases}}
      }
  }
}

Bernard
  • 441
  • 4
  • 17