0

I have a use case where i want to tokenise emailId with

Category 1 words separated by punctuations & prefix tokens

Category 2 words separated by punctuations & prefix tokens & along with punctuation

for example. email - ona.ki@gl.co

I want to know whether the the following tokens for the above emailid is possible to achieve.

on, ona, ona., ona.k, ona.ki, ona.ki@, ona.ki@g, ona.ki@gl, ona.ki@gl., ona.ki@gl.c, ona.ki@gl.co

.k, .ki, .ki@, .ki@g, .ki@gl, .ki@gl. , .ki@gl.c, .ki@gl.co

ki, ki@, ki@g, ki@gl, ki@gl. , ki@gl.c, ki@gl.co

@g, @gl, @gl. , @gl.c, @gl.co

gl, gl., gl.c, gl.co

.c, .co

co

Use case example for ona.ki@gl.co

ona.k - should match

na.k - should not match

.ki@ - Should match

ki@ - Should match

i@ - Should not match

The reason why i want to tokenise this way is because consider there are 2 doc with text values

  1. ona.ki@gl.com
  2. mona.gh@gl.com When the user types on, ona, ... i want to fetch and show only ona.ki@gl.com not the other one.

Thanks in advance.

0 Answers0