Match Substring email address at specific location in ELK

Question

I am trying to find out data matching emails from a message field in ELK Kibana discover section, I am getting the results using:

@message:"abc@email.com"

However, the results produced contains some other messages where email should not be matched, I am unable to build solution for this.

Results are(data has been sanitized for security reasons):

@message:[INF] [2020-07-07 12:54:51.105] [PID-1] : [abcdefg] [JID-5c] [data] LIST_LOOKUP: abc@email.com | User List from Profiles | name | user_name @id:355502086986714

@message:[INF] [2020-07-07 12:38:36.755] [PID-2] : [abcdefg] [JID-ed2] [data] LIST_LOOKUP: abc@email.com | User List from Profiles | name | user_name @id:355501869671304

@message:[INF] [2020-07-07 12:19:48.141] [PID-3] [abc@email.com] : [c5] [data] Completed 200 OK in 11ms @id:355501617979964834

@message:[INF] [2020-07-07 11:19:48.930] [PID-5] [abc@email.com] : [542] [data] Completed 200 OK in 9ms @id:35550081535

while I want it to be:

@message:[INF] [2020-07-07 12:19:48.141] [PID-3] [abc@email.com] : [c5] [data] Completed 200 OK in 11ms @id:355501617979964834

@message:[INF] [2020-07-07 11:19:48.930] [PID-5] [abc@email.com] : [542] [data] Completed 200 OK in 9ms @id:35550081535

I've tried using @message: "[PID-*] [abc@email.com]",@message: "\[PID-*\] \[abc@email.com\] \:", @message: "[abc@email.com]", @message: *abc@email.com* and some more similar searches to no success.

Please let me know what I am missing here and how to make efficient subtext searches in ELK kibana using discover and KQL/Lucene.

Here is the mapping for my index(I am getting data from cloudwatch logs):

{
   "cwl-*":{
      "mappings":{
         "properties":{
            "@id":{
               "type":"string"
            },
            "@log_stream":{
               "type":"string"
            },
            "@log_group":{
               "type":"string"
            },
            "@message":{
               "type":"string"
            },
            "@owner":{
               "type":"string"
            },
            "@timestamp":{
               "type":"date"
            }
         }
      }
   }
}

score 1 · Answer 1 · answered Jul 07 '20 at 14:28

1

All of your results contain abc@gmail.com. So it is expected.

[abc@gmail.com] is tokenised as

{
    "tokens": [
        {
            "token": "abc",
            "start_offset": 1,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "gmail.com",
            "start_offset": 5,
            "end_offset": 14,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

If you have an email field, you can make use of it. Or you need to alter your mapping for that field.

If it doesn't answer your question, can you add mapping for that field using http://host:port/indexName/_mapping

answered Jul 07 '20 at 14:28

Gibbs

21,904
13
74
138

Is it possible to create any substring query to get the required result? – Shubham Namdeo Jul 07 '20 at 14:33
I need to look at your mapping to check that. That's why I asked you to add the mapping also. Problem is that your data may not be stored as single string in the index. – Gibbs Jul 07 '20 at 14:40
I've added the mapping, just fyi I am getting all of the data from cloudwatch logs via aws provided lambda function. – Shubham Namdeo Jul 07 '20 at 15:10

Amit · Accepted Answer · 2020-07-08T03:16:26.757

As @Gibbs already mentioned the cause all your data contains the string abc@email.com and by seeing your mapping now its confirmed that your are using the string field without explicit analyzer will uses the default standard analyzer

Instead of this you should map your field which gets the mail id to custom analyzer which uses the UAX URL Email tokenizer which doesn't split the text.

Example on how to create this analyzer with example

Mapping with custom email analyzer

{
    "settings": {
        "analysis": {
            "analyzer": {
                "email_analyzer": {
                    "tokenizer": "my_tokenizer"
                }
            },
            "tokenizer": {
                "my_tokenizer": {
                    "type": "uax_url_email"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "email": {
                "type": "text",
                "analyzer": "email_analyzer"
            }
        }
    }
}

Analyze api response

POST http://{{hostname}}:{{port}}/{{index-name}}/_analyze

{
    "analyzer": "email_analyzer",
    "text": "abc@email.com"
}


{
    "tokens": [
        {
            "token": "abc@email.com",
            "start_offset": 0,
            "end_offset": 13,
            "type": "<EMAIL>",
            "position": 0
        }
    ]
}

Thank you for the detailed response, I am getting data via cloudwatch logs and may not be able to change mappings, because I am getting multiple log group's data into elk, will it be possible to generate query without changing my mappings. — Shubham Namdeo, Jul 08 '20 at 06:02
@ShubhamNamdeo no without changing the way you store these mail address it wont be possible to filter them as elasticsearch works on token(index) to token(generated by search query) matches — Amit, Jul 08 '20 at 06:19

Match Substring email address at specific location in ELK

2 Answers2