I am using AWK to read through a custom log file I have. The format is something like this:
[12:08:00 +0000] 192.168.2.3 98374 "CONNECT 192.168.2.4:8091 HTTP/1.0" 200
Right now, I have AWK (from bash) set to read the whole log, analyze each line and grab each line that contains "CONNECT" which works, however, it does not help me discover unique clients.
The way to do this would be to somehow filter it so that it analyzed this part of each line: "CONNECT 192.168.2.4:8091 HTTP/1.0"
If there was a way to grab all those lines in a log file, then compare them all and only count similar lines as one. So let's say, for example:
[12:08:00 +0000] 192.168.2.3 98374 "CONNECT 192.168.2.6:8091 HTTP/2.0" 200
[12:08:00 +0000] 192.168.2.3 98374 "CONNECT 192.168.2.9:8091 HTTP/2.0" 200
[12:08:00 +0000] 192.168.2.3 98374 "CONNECT 192.168.2.2:8091 HTTP/2.0" 200
[12:08:00 +0000] 192.168.2.3 98374 "CONNECT 192.168.2.9:8091 HTTP/2.0" 200
In this case, the answer I need would be 3, not 4. Because 2 lines are the same, so there are only 3 unique lines. What I need is an automated way to accomplish this with AWK.
If anybody can lend a hand that would be great.