How to update the tokens of standard tokenizer

Question

I am using the standard tokenizer in my elasticsearch plugin. I need to iterate each token of standard tokenizer and update with some encrypted text to the lucene index. Is there any way to update the tokens of standard tokenizer? Can anyone help?

Can you provide a concrete example of what you're trying to achieve? — Val, Aug 05 '20 at 14:36
That doesn't look concrete enough to me :-) Show a real text input and what you'd like to index. Also curious why it needs to be encrypted... how do you expect to search over encrypted data? — Val, Aug 06 '20 at 06:12
You can have ingest pipeline which does encryption. But the question is that why do you need to decrypt everytime you read as @Val asked. — Gibbs, Aug 06 '20 at 06:24
The question is why do you need to store/index those PII tokens at all since you won't be able to search on it anyway... it's a waste of space, so what's the goal? All you need to do is to scramble the PII bits in your source document but you don't need to index those at all in my opinion — Val, Aug 06 '20 at 07:32

score 1 · Answer 1 · answered Aug 05 '20 at 14:12

Its an interesting use case, but tokenizer IMHO is not the correct place where it should be done, basically the elasticsearch analysis process consists of below three-phase.

char filter
tokenizer
token filter

if you want to change some chars, before sending it to tokenizer do it in char filter or change the tokens in the token filter, as you can see in these phases you can do more transformation than in tokenizer phase.

How to update the tokens of standard tokenizer

1 Answers1