Ok, so from the comments above continue as an answer to make it easier to read and no character limit.
Comment
I don't think you can use pipeline aggregation to achieve it.
It's not a lot to process on client side i guess. only 20 records (10 for authors and 10 for co-authors) and it would be simple aggregate query.
Another option would be to just get top 10 across both fields and also simple agg query.
But if you really need intersection of both top10s on ES side go with Scripted Metric Aggregation. you can lay your logic in the code
First option is as simple as:
GET index_name/_search
{
"size": 0,
"aggs": {
"firstname_dupes": {
"terms": {
"field": "authorFullName.keyword",
"size": 10
}
},
"lastname_dupes": {
"terms": {
"field": "coauthorFullName.keyword",
"size": 10
}
}
}
}
and then you do intersection of the results on the client side.
Second would look like:
GET index_name/_search
{
"size": 0,
"aggs": {
"name_dupes": {
"terms": {
"script": {
"source": "return [doc['authorFullName.keyword'].value,doc['coauthorFullName.keyword'].value]"
}
, "size": 10
}
}
}
}
but it's not really an intersection of top10 authors and top10 coauthors. it's an intersection of all and then getting top10.
The third option is to write Scripted Metric Aggregation. Didn't have time to spend on algorithmic side of things (it should be optimized) but it might look as this one. For sure java skills will help you. Also make sure you understand all the stages of Scripted Metric Aggregation execution and performance issues you might have using it.
GET index_name/_search
{
"size": 0,
"query" : {
"match_all" : {}
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : "state.fnames = [:];state.lnames = [:];",
"map_script" :
"""
def key = doc['authorFullName.keyword'];
def value = '';
if (key != null && key.value != null) {
value = state.fnames[key.value];
if(value==null) value = 0;
state.fnames[key.value] = value+1
}
key = doc['coauthorFullName.keyword'];
if (key != null && key.value != null) {
value = state.lnames[key.value];
if(value==null) value = 0;
state.lnames[key.value] = value+1
}
""",
"combine_script" : "return state",
"reduce_script" :
"""
def intersection = [];
def f10_global = new HashSet();
def l10_global = new HashSet();
for (state in states) {
def f10_local = state.fnames.entrySet().stream().sorted(Collections.reverseOrder(Map.Entry.comparingByValue())).limit(10).map(e->e.getKey()).collect(Collectors.toList());
def l10_local = state.lnames.entrySet().stream().sorted(Collections.reverseOrder(Map.Entry.comparingByValue())).limit(10).map(e->e.getKey()).collect(Collectors.toList());
for(name in f10_local){f10_global.add(name);}
for(name in l10_local){l10_global.add(name);}
}
for(name in f10_global){
if(l10_global.contains(name)) intersection.add(name);
}
return intersection;
"""
}
}
}
}
Just a note, the queries here assume you have keyword
on those properties. If not just adjust them to your case.
UPDATE
PS, just noticed you mentioned you need common counts, not common names. not sure what is the case but instead of map(e->e.getKey())
use map(e->e.getValue().toString())
. See the other answer on similar problem