1

Since there is no documentation about the subject, it is very complicated to understand how to implement a custom token filter plugin from scratch in Java.

I'd like to get an analyzer filter that returns only tokens that are numbers for example.

Any idea?

solbader
  • 21
  • 5

1 Answers1

1

There are existing filters that do this. For instance the keep_types token filter can do exactly that.

If you leverage the <NUM> type, your custom token filter is going to only let numeric tokens through and filter out all others.

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "keep_types",
      "types": [ "<NUM>" ]
    }
  ],
  "text": "1 quick fox 2 lazy dogs"
}

Result:

[1, 2]

You can achieve a similar result with the pattern_capture token filter as well.

But if you really want to go the Java way, then you're best best is to clone an existing analysis plugin and roll your own.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Hi Val, thanks a lot for your answer, but I'm not really looking for a solution using existing filters. I need to implement a filter plugin from scratch using Java class. – solbader Oct 23 '20 at 11:39
  • 1
    Then, you're best best is to clone an [existing analysis plugin](https://github.com/elastic/elasticsearch/tree/master/plugins) and roll your own. – Val Oct 23 '20 at 11:41