1

I am using path hierarchy tokenizer for a field in Logstash/ElasticSearch. So, if the path field is like /a/b/c, the tokenizer converts it to

    /a
    /a/b
    /a/b/c

I want to generate stats like

    a - 3 hits
    b - 2 hits
    c - 1 hit

What is the best possible way to do that? Also, I wonder if there is a way to add the folder depth in a separate field.

Ravi Sidhu
  • 11
  • 2

1 Answers1

0

For your custom purpose I think you can specify a custom pattern analyzer on the filed and take the terms aggregation of field. An example is as follows :

Define your custom analyzer :

PUT /test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "nonword": {
          "type": "pattern",
          "pattern": "/"
        }
      }
    }
  }
}

Create mapping :

 POST /test_index/_mapping/test_1
{
  "properties": {
    "dir": {
      "type": "string",
      "index": "analyzed",
      "analyzer": "nonword",
      "fields": {
        "un_touched": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

Note : 'un_touched' field is kept for holding the original version of the data.

Populate data and perform the aggregation :

GET /test_index/test_1/_search
{
  "aggs": {
    "my_agg": {
      "terms": {
        "field": "dir",
        "size": 0
      }
    }
  }
}

Note : This is only a minimal example and you should really care about the pattern;

Shalin LK
  • 190
  • 5
  • 14
  • Hi Shalin, Thank you so much! I have been trying but I could not get it working so far. I'll keep you posted. Sorry, – Ravi Sidhu Feb 05 '16 at 05:42
  • Further googling suggests that Multi_Field had been removed from ElasticSearch sometime in 2014. – Ravi Sidhu Feb 05 '16 at 07:36
  • Oh sorry.. here is the fix for multi field: https://www.elastic.co/guide/en/elasticsearch/reference/2.x/_multi_fields.html I will update my answer; thanks; – Shalin LK Feb 05 '16 at 08:14
  • @Ravi Sidhu answer is edited according to ES 2.X supporting format. – Shalin LK Feb 05 '16 at 08:26
  • Thank you! I did make the change, but I am not sure how it varies from the initial path hierarchy tokenizer I used. In the example, if I have something like /a /a/b /a/b/c /d /d/e /d/e/f How do I group these by the levels? Level1, and then drill down to level2 and so on? – Ravi Sidhu Feb 05 '16 at 10:07