Wonder if someone could help.
I've got an ElasticSearch index defined broadly as below:
{
"properties": {
"content": {
"type": "string"
},
"topics": {
"properties": {
"topic_type": {
"type": "string"
},
"topic": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
So you end up with an entry in the index broadly along the lines of:
{
"content": "some load of content",
"timestamp": "some time stamp",
"id": "some id",
"topics": [
{
"topic": "safety",
"topic_type": "Flight"
},
{
"topic": "rockets",
"topic_type": "Space"
}
]
}
where each blob of content can have more than one topic associated with it.
What I'd like to be able to do is: aggregate by day a count of all the different "Space" topics E.g.:
April 1st:
- "rockets": 20
- "astronauts": 2
- "aliens": 5
April 2nd:
- "rockets": 10
- "astronauts": 12
- "aliens": 51
and so on.
What I've tried to do is something like:
curl -X POST 'http://localhost:9200/myindex/_search?search_type=count&pretty=true' -d '{
"size": "100000",
"query": {
"bool": {
"must": [
{
"term": {
"myindex.topics.topic_type": "space"
}
}
]
}
},
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"topics_over_time": {
"terms": {
"field": "topics.topic"
}
}
}
}
}
}'
The problem with this is that although it just picks up those articles that have a topic_type of "space", some of those articles will have other "topics.topic" that get picked up in the "aggs" bit i.e. that do not have a topic_type of "space".
What I want to be able to do is to say "count & aggregate [group by essentially] those topics that are of topic type 'space'".
So with just this in the index:
{
"content": "some load of content",
"timestamp": "some time stamp",
"id": "some id",
"topics": [
{
"topic": "safety",
"topic_type": "Flight"
},
{
"topic": "rockets",
"topic_type": "Space"
}
]
}
It would be: rockets: 1
With these two in the index:
{
"content": "some load of content",
"timestamp": "some time stamp",
"id": "some id",
"topics": [
{
"topic": "safety",
"topic_type": "Flight"
},
{
"topic": "rockets",
"topic_type": "Space"
}
]
}
{
"content": "some load of content2",
"timestamp": "some time stamp",
"id": "some id",
"topics": [
{
"topic": "safety",
"topic_type": "Flight"
},
{
"topic": "rockets",
"topic_type": "Space"
},
{
"topic": "aliens",
"topic_type": "Space"
}
]
}
It would be: rockets: 2, aliens: 1
- but all grouped by day.
Not sure how to do this with ES.
If the index schema is not fit-for-purpose here, please do let me know what is (in your opinions).