0

I am trying to clean log files to categorize them in Splunk, so my question is:

(?i)^(?:[^ ]* ){8}(?P.+)((?=\d{8}\d{8}\d{10}.)|(?=\d{8}.?))

bold area needs to be combined like if/else,[it selects good before bold lines]

I want it to STOP just before just-8 digits and dot (ddddddd.) OR 8digit_8digit_10digits (8xd_8xd_10xd.)

my task is to get rid of all unique numbers on log file that i can categorize it better.

please help

akemko
  • 55
  • 10
  • 1
    Could you provide an example of desired input and output? – CAustin Mar 25 '14 at 19:03
  • Input: Timestamp: 2/26/2014 4:00:42 PM SN #7 Message: ServerXYZ: AppXYZ failed to grab activity code for response 12345678. Timestamp: 2/26/2014 3:37:42 PM SN #31 Message: Error copying folders, the following exception was thrown IOException: The process cannot access the file 12345678_12345678_1234567890' Output: should refer to AppXYZ failed to grab activity code for response Error copying folders, the following exception was thrown IOException: The process cannot access the file I desire no unique number fields,I can do some stats like what type of error I have on my site. – akemko Mar 25 '14 at 19:15
  • Where exactly are you trying to capture **from** ? From the regex, it looks like `Message: --- (stop at those digits)`. This `(?: [^ ]*[ ]){8}` is not recommended. Also what is the `^` caret to represent, Beginning of Line or String ? –  Mar 25 '14 at 19:46

1 Answers1

0

You could just make the previous .+ lazy:

(?i)^(?:[^ ]* ){8}(?P<FIELDNAME>.+?)((?=\d{8}_\d{8}_\d{10}\.)|(?=\d{8}\.))
                                  ^

Being greedy, it will stop as far as possible from the initial match. I removed the ? at the end since it would make it stop as soon as there're 8 digits ahead. Also, you can actually combine these lookaheads:

(?i)^(?:[^ ]* ){8}(?P<FIELDNAME>.+?)(?=\d{8}(?:_\d{8}_\d{10})?\.)
Jerry
  • 70,495
  • 13
  • 100
  • 144
  • Hey Jerry, Thanks for %100 accurate answer, It is me doesn't have %100 accurate question - so I m sorry for taking your time more. How can I ensure that it would stop before when it sees 8 digit number. In above case: AppXYZ failed to grab activity code for response 12345678. -> succeded to get message while: Timestamp: 2/26/2014 3:31:37 PM ....Message: ServerXYZ2: SendMail:Error sending mail for Response ID:12345678 -> it didn't work I didn't include this because stopping before 8 digit number would include this log as well.I'mNotLazy,JustDoesn'tKnowHowToCombineThoseLookahead.ThanxInAdvance – akemko Mar 25 '14 at 20:51
  • I found it. Adding "?" after last "." enables other 8 digits as well. God, I cleaned %95 percent. %5 is left. Again thanks to you Jerry. – akemko Mar 25 '14 at 21:03