I need to calculate the weighted average value using the elastic search, I can't change the structure of the documents. If we assume that there are 2 indexed documents.
The first document
const doc1 = {
"id": "1",
"userId: "2",
"scores" : [
{
"name": "score1",
"value": 93.0
},
{
"name": "score2",
"value": 90.0
},
{
"name": "score3",
"value": 76.0
}
],
"metadata": {
"weight": 130
}
}
Second document
const doc2 = {
"id": "2",
"userId: "2",
"scores" : [
{
"name": "score1",
"value": 80.0
},
{
"name": "score2",
"value": 70.0
},
{
"name": "score3",
"value": 88.0
}
],
"metadata": {
"weight": 50
}
}
Calculations should be done by the following formula:
score1Avg = (doc1.scores['score1'].value * doc1.metadata.weight +
doc2.scores['score1'].value * doc2.metadata.weight)/(doc1.weight+doc2.weight)
score2Avg = (doc1.scores['score2'].value * doc1.metadata.weight +
doc2.scores['score2'].value * doc2.metadata.weight)/(doc1.weight+doc2.weight)
score3Avg = (doc1.scores['score3'].value * doc1.metadata.weight +
doc2.scores['score3'].value * doc2.metadata.weight)/(doc1.weight+doc2.weight)
I tried something with nested type for mapping scores
, but I can't access the parent document field metadata.weight
. How this should be approached, should I use nested type mapping or this can be done in some other way without that?
Edit: I ended up storing scores element as separated documents. Instead of doc1, now I have the following documents.
{
"id": "1",
"userId: "2",
"score": {
"name": "score1",
"value": 93.0
},
"metadata": {
"weight": 130
}
}
{
"id": "1",
"userId: "2",
"score": {
"name": "score2",
"value": 90.0
},
"metadata": {
"weight": 130
}
}
{
"id": "1",
"userId: "2",
"score": {
"name": "score3",
"value": 76.0
},
"metadata": {
"weight": 130
}
}
And the query is:
GET /scores/_search
{
"size": 0,
"aggs": {
"group_by_score_and_user": {
"composite": {
"sources": [
{
"scoreName": {
"terms": {
"field": "score.name.keyword"
}
}
},{
"userId": {
"terms": {
"field": "userId.keyword"
}
}
}
]
},
"aggs": {
"avg": {
"weighted_avg": {
"value":{ "field": "score.value" },
"weight":{ "field": "metadata.weight" }
}
}
}
}
}
}
Btw, the query with the script approach against 5k documents takes 120 ms on average compared to this which takes about 35-40 ms over 100k documents.