I need to fix some multi line log entries, currently using perl but I need to move the functionality to python.
example multi line entry :
2015-12-02T17:56:13.783276Z our-elb-prod 52.20.50.51:60944 10.30.0.32:80 0.000024 0.063357 0.000066 200 200 0 12164 "GET http://www.example.com:80/episodes/2014/10/ HTTP/1.0" "IgnitionOneBot/Nutch-1.9 (
This is the IgnitionOne Company Bot for Web Crawling.
IgnitionOne Company Site: http://www.example.com/
;
rong2 dot huang at ignitionone dot com
)" - -
Current perl script to fix these is :
while (my $row = <$fh>) {
chomp $row;
if ( $row =~ /^(\d{4})-(\d\d)-(\d\d)T(\d)/ ) {
print "\n" if $. != 1;
}
print $row;
which outputs the corrected single line entry :
2015-12-02T17:56:13.783276Z telepictures-elb-prod 52.20.50.51:60944 10.30.0.32:80 0.000024 0.063357 0.000066 200 200 0 12164 "GET http://www.example.com:80/episodes/2014/10/ HTTP/1.0" "IgnitionOneBot/Nutch-1.9 ( This is the IgnitionOne Company Bot for Web Crawling. IgnitionOne Company Site: http://www.example.com/ ; rong2 dot huang at ignitionone dot com )" - -
So in a nutshell we're basically looking for any lines that don't begin with the date regex, if they match we're adding them to the first line without a \n.
I've seen other ways to accomplish this with awk etc, but need this to be pure python. I've looked at Python. Join specific lines on 1 line , it looks like itertools might be the preferred way to go about this?