I need to get a count of word X from all texts in index Y, which has only one field "content". Note that I need a count of specific word, how many times it occurred in total across all documents. From what I've found ES is not well optimized for this (since this is a text type), but this is for university homework, so I have little choice.
So far I've tried (taken from here):
{
"script_fields": {
"phrase_Count": {
"script": {
"lang": "painless",
"source": "int count = 0; if(doc['content.keyword'].size() > 0 && doc['content'].value.indexOf(params.phrase)!=-1) count++; return count;",
"params": {
"phrase": "ustawa"
}
}
}
}
}
The scripting approach returns:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:88)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:40)",
"if(doc['content.keyword'].size() > 0 && doc['content'].value.indexOf(params.phrase)!=-1) ",
" ^---- HERE"
],
"script": "int count = 0; if(doc['content.keyword'].size() > 0 && doc['content'].value.indexOf(params.phrase)!=-1) count++; return count;",
"lang": "painless",
"position": {
"offset": 22,
"start": 15,
"end": 104
}
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "bills",
"node": "MXtcD7-zT-mhDyxMeRTMLw",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:88)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:40)",
"if(doc['content.keyword'].size() > 0 && doc['content'].value.indexOf(params.phrase)!=-1) ",
" ^---- HERE"
],
"script": "int count = 0; if(doc['content.keyword'].size() > 0 && doc['content'].value.indexOf(params.phrase)!=-1) count++; return count;",
"lang": "painless",
"position": {
"offset": 22,
"start": 15,
"end": 104
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [content.keyword] in mapping with types []"
}
}
}
]
},
"status": 400
}
Above the content.keyword
was used, since with plain content
ES was complaining about the text type not being optimized for such searches.
I also tried using text statistics (from here), but I couldn't get this to work, it only counted documents with the word (which is not what I'm looking for).
As my last approach I tried search with aggregation (from here), but it also just returned the count of documents, not words:
{
"query": {
"query_string": {
"fields": ["content"],
"query": "ustawa"
}
},
"aggs": {
"my-terms": {
"terms": {
"field": "content.keyword"
}
}
}
}
How can I achieve this? I'm using Python, if it matters.
EDIT Mapping for index I'm using:
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}