I have about 30,000 Apache access logs, some of which list multiple client IP addresses. This is as a result of Apache logging the X-Forwarded-For header instead of the IP address of the client. The reason that was done is because we recently added haproxy in front of the web servers.
Going forward, we will be using rpaf for Apache to log only 1 IP address, i.e. that of the incoming connection to haproxy, so this will not be an ongoing problem.
Which brings me to the actual question:
How can I process the existing logs with multiple IP addresses, to extract only the one that I want. I am assuming I'd need sed or something similar, but I'm more of a Windows guy, so not 100% sure.
The rules are:
- If there's only 1 IP, the line is not modified.
- If there are 2 or more IPs, I only want to keep the second-to-last IP. They are comma-separated.
Example 1, 1 IP
Input: 10.1.1.1 - - [29/Jan/2010:11:00:00] .... (rest of log line)
Output: 10.1.1.1 - - [29/Jan/2010:11:00:00] .... (rest of log line)
Example 2, 2 IPs
Input: 10.1.1.1, 10.2.2.2 - - [29/Jan/2010:11:00:00] .... (rest of log line)
Output: 10.1.1.1 - - [29/Jan/2010:11:00:00] .... (rest of log line)
Example 3, 3 IPs
Input: 10.1.1.1, 10.2.2.2, 10.3.3.3 - - [29/Jan/2010:11:00:00] .... (rest of log line)
Output: 10.2.2.2 - - [29/Jan/2010:11:00:00] .... (rest of log line)