I am working on a regex that would parse the logs into groups to write into a df. Some of the log events break over multiple lines. I am doing this jupyter notebook
I am trying to find a regex that looks forward for the start of the next line that starts with a time stamp.
I am not sure if regex would get me past multiple lines.
This fails with the 'new line' in the look forward near the end.
regex = re.compile(
'(?P<time>(\d{2}\:\d{2}\:\d{2}.\d{3}))'
'(?P<a>\s(\[T\:))?'
'(?P<token>\d{15})?'
'(?P<b>(\]\s{))?'
'(?P<event_type>.*\:\d)?'
'(?P<c>\}\s)'
'(?P<message>.*)'
'(?=^\d{2}\d{2})'
This works but breaks the last message at the \d{2}\d{2} but it has to be in the same line, not the next line.
regex = re.compile(
'(?P<time>(\d{2}\:\d{2}\:\d{2}.\d{3}))'
'(?P<a>\s(\[T\:))?'
'(?P<token>\d{15})?'
'(?P<b>(\]\s{))?'
'(?P<event_type>.*\:\d)?'
'(?P<c>\}\s)'
'(?P<message>.*)'
'(?=\d{2}\d{2})'
I am testing the regex with this:
with open(ors_log) as f:
for m in regex.finditer(f.read()):
if m:
print(m.group('time', 'a', 'token', 'b', 'event_type', 'c', 'message'))
I have tried stackoverflow suggestions:
with open(ors_log) as f:
for m in regex.finditer(f.read(), re.MULTILINE):
and
(?=\n\d{2}\d{2})')
Any thoughts ideas are much appreciated, thank you.
15:36:32.448 [T:140113529200000] {ScxmlMetric:3} METRIC <log sid='~456~1TF8DKFD49SRE6Q9PE0C2LAES00000J' expr='~456~01TF8DKFD49SRE6Q9PE0C2LAES00000J: Inside Screen Block: Priorities' label='' level='2' />
15:36:32.448 [T:14011340184339] {ScxmlMetric:1} METRIC <extension sid='~456~01TF8DKFD49we4rf5g6h7C2LAES00000J' name='screen' namespace='https://lab.io/modules' />
15:36:32.448 ==>Connector::ConnHandler Port=0 Proto=0 CallBack=<9740>
===>event: event_id=3, id=0 handle=66, datasize=24
15:36:32.448 {Thread:3} HandleThreadData: << 24 bytes <<
15:36:32.448 {Link:3} Message 'request' sent to 'IP'
attr_ref_id [int] = 999530875
attr_envelope [list, size (unpacked)=369] =
'Version' [str] = "1.0"
'AppType' [int] = 90
'Service' [str] = "Screen"
15:36:34.222 {ThreadSync:3} HandleThreadData: << 24 bytes <<
message 1
METRIC <log sid='~456~1TF8DKFD49SRE6Q9PE0C2LAES00000J' expr='~456~01TF8DKFD49SRE6Q9PE0C2LAES00000J: Inside Screen Block: Priorities' label='' level='2' />
message 2
METRIC <extension sid='~456~01TF8DKFD49we4rf5g6h7C2LAES00000J' name='screen' namespace='https://lab.io/modules' />
message 3
==>Connector::ConnHandler Port=0 Proto=0 CallBack=<9740>
===>event: event_id=3, id=0 handle=66, datasize=24
message 4
HandleThreadData: << 24 bytes <<
message 5
Message 'request' sent to 'IP'
attr_ref_id [int] = 999530875
attr_envelope [list, size (unpacked)=369] =
'Version' [str] = "1.0"
'AppType' [int] = 90
'Service' [str] = "Screen"
message 6
HandleThreadData: << 24 bytes <<