0

I use 'term aggregation' to know how many times a word is repeated in elasticsearch. This method works properly for short string filed's.

my simple term aggregation :

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        }
      ],
      "must_not": []
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "aggs": {
    "2": {
      "terms": {
        "field": "msgtxt.keyword"
      }
    }
  }
}

but in long string filed with long text like 'articles' it returns some long sentences.

Is it possible to find the number of repetitions using 'term aggregation' or other methods? ( article text is in Arabic/Persian language )

MOB
  • 853
  • 2
  • 13
  • 28

1 Answers1

1

I think you need term vector not aggregation for this case. Here is the documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html

  • term vector return result for single document . multi term vector can do term vector function for about 1000 or more document in one time ? – MOB Sep 29 '17 at 15:26
  • I haven't test with more than 100 documents. But I think you can ;) – Mathieu Giboulet Sep 29 '17 at 15:33
  • multi term vector return terms and freq for each document , and not for all document – MOB Sep 30 '17 at 06:03
  • Ok my bad, wrong solution. I haven't try but significant terms could help maybe : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html – Mathieu Giboulet Sep 30 '17 at 09:38