0

I am working on a regex that would parse the logs into groups to write into a df. Some of the log events break over multiple lines. I am doing this jupyter notebook

I am trying to find a regex that looks forward for the start of the next line that starts with a time stamp.

I am not sure if regex would get me past multiple lines.

This fails with the 'new line' in the look forward near the end.

regex = re.compile(
    '(?P<time>(\d{2}\:\d{2}\:\d{2}.\d{3}))'
    '(?P<a>\s(\[T\:))?'
    '(?P<token>\d{15})?'
    '(?P<b>(\]\s{))?'
    '(?P<event_type>.*\:\d)?'
    '(?P<c>\}\s)'
    '(?P<message>.*)'
    '(?=^\d{2}\d{2})'

This works but breaks the last message at the \d{2}\d{2} but it has to be in the same line, not the next line.

regex = re.compile(
    '(?P<time>(\d{2}\:\d{2}\:\d{2}.\d{3}))'
    '(?P<a>\s(\[T\:))?'
    '(?P<token>\d{15})?'
    '(?P<b>(\]\s{))?'
    '(?P<event_type>.*\:\d)?'
    '(?P<c>\}\s)'
    '(?P<message>.*)'
    '(?=\d{2}\d{2})'

I am testing the regex with this:

with open(ors_log) as f:
    for m in regex.finditer(f.read()):
        if m:
            print(m.group('time', 'a', 'token', 'b', 'event_type', 'c', 'message'))

I have tried stackoverflow suggestions:

with open(ors_log) as f:
    for m in regex.finditer(f.read(), re.MULTILINE):

and

(?=\n\d{2}\d{2})')

Any thoughts ideas are much appreciated, thank you.

15:36:32.448 [T:140113529200000] {ScxmlMetric:3} METRIC <log sid='~456~1TF8DKFD49SRE6Q9PE0C2LAES00000J' expr='~456~01TF8DKFD49SRE6Q9PE0C2LAES00000J: Inside Screen Block: Priorities' label='' level='2' />
15:36:32.448 [T:14011340184339] {ScxmlMetric:1} METRIC <extension sid='~456~01TF8DKFD49we4rf5g6h7C2LAES00000J' name='screen' namespace='https://lab.io/modules' />
15:36:32.448 ==>Connector::ConnHandler  Port=0 Proto=0 CallBack=<9740>
===>event:   event_id=3, id=0 handle=66, datasize=24 
15:36:32.448 {Thread:3} HandleThreadData: << 24 bytes <<
15:36:32.448 {Link:3} Message 'request' sent to 'IP'
    attr_ref_id [int] = 999530875
    attr_envelope [list, size (unpacked)=369] = 
       'Version' [str] = "1.0"
       'AppType' [int] = 90
       'Service' [str] = "Screen"
15:36:34.222 {ThreadSync:3} HandleThreadData: << 24 bytes <<

message 1

METRIC <log sid='~456~1TF8DKFD49SRE6Q9PE0C2LAES00000J' expr='~456~01TF8DKFD49SRE6Q9PE0C2LAES00000J: Inside Screen Block: Priorities' label='' level='2' />

message 2

METRIC <extension sid='~456~01TF8DKFD49we4rf5g6h7C2LAES00000J' name='screen' namespace='https://lab.io/modules' />

message 3

==>Connector::ConnHandler   Port=0 Proto=0 CallBack=<9740>
===>event:   event_id=3, id=0 handle=66, datasize=24 

message 4

HandleThreadData: << 24 bytes <<

message 5

Message 'request' sent to 'IP'
    attr_ref_id [int] = 999530875
    attr_envelope [list, size (unpacked)=369] = 
       'Version' [str] = "1.0"
       'AppType' [int] = 90
       'Service' [str] = "Screen"

message 6

HandleThreadData: << 24 bytes <<
newmember
  • 21
  • 2

0 Answers0