I am trying to write python code to extract certain fields from elb logs but i am not able to find proper regex for all elb log fields like "user_agent"
, request
etc
like how to print pattern
"POST https://example.com:443/api/pages/uuids/8ad6e82e-f86b-11ea-a68d-cbc99f85d247/updateUserHeartbeat HTTP/2.0" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"
from below log using generic regex
various elb fields are mentioned here https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html
sample regex which i got :
regex = r'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"'
line_split = re.split(regex, line)
sample log line from log file is as below
h2 2021-06-07T23:57:13.300250Z app/megapool-retool-app/dbb257b8adaa87cf 93.107.2.244:59799 - -1 -1 -1 302 - 3087 561 "POST https://example.com:443/api/pages/uuids/8ad6e82e-f86b-11ea-a68d-cbc99f85d247/updateUserHeartbeat HTTP/2.0" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:us-west-2:752180062774:targetgroup/megapool-retool-app/1665e090211d92fc "Root=1-6089b259-1c8c6bca3b1d7a895a21a694" "xyz.com" "arn:aws:acm:us-west-2:75218123456562774:certificate/b7a45f0c-3009-42c2-97b9-ab81a61d1b25" 0 2021-06-07T23:57:13.299000Z "authenticate" "-" "-" "-" "-" "-" "-"