Here's an example of a document in my ES index:
{
"concepts": [
{
"type": "location",
"entities": [
{ "text": "Raleigh" },
{ "text": "Damascus" },
{ "text": "Brussels" }
]
},
{
"type": "person",
"entities": [
{ "text": "Johnny Cash" },
{ "text": "Barack Obama" },
{ "text": "Vladimir Putin" },
{ "text": "John Hancock" }
]
},
{
"type": "organization",
"entities": [
{ "text": "WTO" },
{ "text": "IMF" },
{ "text": "United States of America" }
]
}
]
}
I'm trying to aggregate and count the frequency of each concept entity in my set of documents for a specific concept type. Let's say I'm only interested in aggregating concept entities of type "location". My aggregation buckets are then going to be "concepts.entities.text", but I only want to aggregate them if "concepts.type" is equal to "location". Here's my attempt:
{
"query": {
// Whatever query
},
"aggs": {
"location_concept_type": {
"filter": {
"term": { "concepts.type": "location" }
},
"aggs": {
"entities": {
"terms": { "field": "concepts.hits.text" }
}
}
}
}
}
The problem with this is that it will filter out of the aggregation the documents that do not have any concept entities of type "location". But for the documents who do have concept entities of type "location" and something else, it will bucket all the concept entities, regardless of the concept type.
I have also tried by restructuring my doc in the following way:
{
"concepts": [
{
"type": "location",
"text": "Raleigh"
},
{
"type": "location",
"text": "Damascus"
},
{
"type": "location",
"text": "Brussels"
},
{
"type": "person",
"text": "Johnny Cash"
},
{
"type": "person",
"text": "Barack Obama"
}
{
"type": "person",
"text": "Vladimir Putin"
}
{
"type": "person",
"text": "John Hancock"
},
{
"type": "organization",
"text": "WTO"
},
{
"type": "organization",
"text": "IMF"
},
{
"type": "organization",
"text": "United States of America"
}
]
}
But that doesn't work either. Finally I cannot use the concept type as the key (which would solve my problem, I believe), because I also need to be able to aggregate across all concept types (and there potentially is an indefinite and changing number of concept types).
Any idea of how to proceed? Thanks in advance for your help.