0

I am trying to append usernames to its corresponding ip address in a log file which is being written continuously. But the new lines are getting appended to the previous ones rendering the log file unanalyzable.

Note: Its a web server log file which is continuously being written and my code checks the ip captured in the logs, finds the corresponding username and inserts the username in the beginning of that specific line in the log on a loop. In the first run theres no error but from the second run the lines get messed up as shown below.

for i in $ips
do
...
..
cp $server_log $log_file
sed -i "/^$i/ s/./$user &/" $log_file
cp $log_file $server_log
...
...
done

input file

10.xx.xx.xxx -[12/Feb/2023 02:46:23] "GET /folder/ HTTP/1.1" 200 -
10.xx.xx.xxx -[12/Feb/2023 02:46:44] "GET /folder/ HTTP/1.1" 200 -
10.xx.xx.56 -[12/Feb/2023 02:47:20] "GET /folder2/HTTP/1.1" 200 -

output

user1 10.xx.xx.xxx -[12/Feb/2023 02:46:23] "GET /folder/ HTTP/1.1" 200 -
user1 10.xx.xx.xxx -[12/Feb/2023 02:46:44] "GET /folder/ HT10.xx.xx.56 -[12/Feb/2023 02:47:20] "GET /folder2/HTTP/1.1" 200 -

expected output

user1 10.xx.xx.34 -[12/Feb/2023 02:46:23] "GET /folder/ HTTP/1.1" 200 -
user1 10.xx.xx.34 -[12/Feb/2023 02:46:44] "GET /folder/ HTTP/1.1" 200 -
user2 10.xx.xx.56 -[12/Feb/2023 02:47:20] "GET /folder2/HTTP/1.1" 200 -
  • Welcome to SO, thanks for sharing your efforts. Please do post sample of input file also in your question to make it more clear, cheers. – RavinderSingh13 Feb 21 '23 at 08:21
  • `nginx` have the predifined variable `$remote_user` accessible in [logging](https://docs.nginx.com/nginx/admin-guide/monitoring/logging/). You should try to modify webserver logformat to better suit your needs. – Gilles Quénot Feb 21 '23 at 08:22
  • Your `sed` string is a bit complicated. `sed -i "s/^$i /$user $i /" $log_file` should provide the correct results. I put a ` ` behind the IP address, to prevent 10.x.x.123 matching 10.x.x.12. – Ljm Dullaart Feb 21 '23 at 09:13
  • Please [edit] your question and add more details about your use case and what you want to achieve. Are you trying to modify a logfile that is being written to by another process? If yes, than this might lead to different kinds of unexpected results. In-place editing (like `sed -i ...` writes the output to a temprary file and replaces the input file afterwards. Depending on how both processes accesse the logfile you can get mixed data from both processes or lose data. – Bodo Feb 21 '23 at 18:29
  • Is this question related to your prevoious question https://stackoverflow.com/q/75470568/10622916? – Bodo Feb 22 '23 at 19:21

2 Answers2

1

GNU Awk

$ cat ips_file
10.xx.xx.101 user1
10.xx.xx.102 user2
10.xx.xx.103 user3

$ cat logfile 
10.xx.xx.101 -[12/Feb/2023 02:46:23] "GET /folder1/ HTTP/1.1" 200 -
10.xx.xx.101 -[12/Feb/2023 02:46:44] "GET /folder1/ HTTP/1.1" 200 -
10.xx.xx.102 -[12/Feb/2023 02:47:20] "GET /folder2/ HTTP/1.1" 200 -
10.xx.xx.101 -[12/Feb/2023 02:46:44] "GET /folder1/ HTTP/1.1" 200 -
10.xx.xx.103 -[12/Feb/2023 02:46:44] "GET /folder3/ HTTP/1.1" 200 -

script:

awk -i inplace '
    NR==FNR{
        userip[$1]=$2
        next 
    }
    ($1 in userip){ $0 = userip[$1] " " $0 }
'1 inplace::enable=0 ips_file  inplace::enable=1 logfile

output:

$ cat logfile 
user1 10.xx.xx.101 -[12/Feb/2023 02:46:23] "GET /folder1/ HTTP/1.1" 200 -
user1 10.xx.xx.101 -[12/Feb/2023 02:46:44] "GET /folder1/ HTTP/1.1" 200 -
user2 10.xx.xx.102 -[12/Feb/2023 02:47:20] "GET /folder2/ HTTP/1.1" 200 -
user1 10.xx.xx.101 -[12/Feb/2023 02:46:44] "GET /folder1/ HTTP/1.1" 200 -
user3 10.xx.xx.103 -[12/Feb/2023 02:46:44] "GET /folder3/ HTTP/1.1" 200 -
ufopilot
  • 3,269
  • 2
  • 10
  • 12
  • Can you explain this solution please? – shanaya sharma Feb 21 '23 at 10:50
  • This solution requires GNU Awk. I doubt that it will work when another process is appending text to the logfile at the same time. – Bodo Feb 21 '23 at 18:22
  • @Bodo the logfile is a copy of the serverlog as he wrote in his script above – ufopilot Feb 21 '23 at 19:35
  • @ufopilot The question states *"a log file which is being written continuously"* and *"Its a web server log file which is continuously being written"*. Maybe it is a misunderstanding what my term *logfile* refers to. The original script creates a copy, modifies the copy, copies it back and repeats this in a loop. Your script can replace the repeated `sed` commands with a single `awk`, but it does not solve the problem that may result from the two `cp` if `$server_log` (= my term *logfile*) is being written at the same time. Unfortunately the OP did not clarify this until now. – Bodo Feb 22 '23 at 14:22
0

ufopilot's answer shows how you can replace repeated sed commands with a single awk command.

I think the problem is related to the two cp commands. Doing this in a loop increases the probability of errors.

This answer tries to explain the behavior you observed. For a suggestion how to fix the problem I would need details about your use case. (See my comment to your question.)


Assuming that $server_log is written by your server at the same time, you might get various problems.
If this assumption is false, clarify this in the question.

Your code:

cp $server_log $log_file
sed -i "/^$i/ s/./$user &/" $log_file
cp $log_file $server_log

Depending on the buffering used by the server, the first copy operation cp $server_log $log_file might end with an incomplete last line in $log_file, e.g.

10.xx.xx.xxx -[12/Feb/2023 02:46:44] "GET /folder/ HT

The processing of the copy is safe.

The second copy operation cp $log_file $server_log will probably truncate $server_log to size 0 and copy data in blocks. (Details depend on cp's implementation.) This can also temporarily result in incomplete lines.

If the server is appending data at the same time it might get mixed with data written by cp, probably after a block as mentioned above.
If the block ended with an incomplete line, e.g. ...GET /folder/ HT, this would also result in lines like

user1 10.xx.xx.xxx -[12/Feb/2023 02:46:44] "GET /folder/ HT10.xx.xx.56 -[12/Feb/2023 02:47:20] "GET /folder2/HTTP/1.1" 200 -
Bodo
  • 9,287
  • 1
  • 13
  • 29