0

I'm looking to parse AWS Load Balancer log files stored in S3, to calculate metrics by the site URL www.example.com instead of the virtual host app/something.com. Is this possible? I'm using GoAccess.

https 2019-11-24T23:55:01.603141Z app/something.com 34.222.222.22:47121 190.61.18.156:80 0.008 0.252 0.000 200 200 191 725 "GET https://www.example.com:443/something.php HTTP/1.1" "Wget/1.18 (linux-gnu)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-1:6474865788:targetgroup/mytargetgroup/be12345678 "Root=1-5ddb4567-149b7e874546754ed496" "www.example.com" "arn:aws:acm:eu-west-1:6474865788:certificate/pwdsw3455-4028-5cb7-854c-gdtr555" 0 2019-11-24T23:55:01.342000Z "waf,forward" "-" "-" "190.61.18.156:80" "200"

Confounder
  • 469
  • 1
  • 8
  • 23

1 Answers1

0

This will work for the line you posted, though you may want to use different delimiter if any of your fields can contain additional spaces.

awk -F'[ ]' '$3=$22$3' access.log | goaccess - -a
MIke
  • 16
  • Thanks for sharing. It gets me really close to what I need. How would I strip out :443/something.php from https://www.example.com:443/something.php? It's column 14 and not 22 that I need. – Confounder Dec 02 '19 at 16:13