-1

I am able to break o/p of a big logfile (filename.log) into individual log files of 1 min each (filename.log.140108) using while loop, but I want these files to be saved as zip files due to capacity issue on VM. Can anyone pls help ??

#!/bin/bash
log_file=/home/tmp/filename.log
tmp_log_file=/home/tmp/filename.log.$$
while true;
do
    sleep 60
    cp $log_file $tmp_log_file
    >$log_file
    mv $tmp_log_file $log_file.$(date +%M%D%Y%H%M) // CODE

----------- current output ---------------

-rw-r--r-- 1 root     root    16789643 Nov  6 14:05 filename.log // Master log file
-rw-r--r-- 1 root     root     2277376 Nov  6 14:01 filename.log.140108 // 1 min log made from master log
-rw-r--r-- 1 root     root     3862528 Nov  6 14:02 filename.log.140208
-rw-r--r-- 1 root     root     5558272 Nov  6 14:03 filename.log.140308
-rw-r--r-- 1 root     root     7147520 Nov  6 14:04 filename.log.140408

------------ expected output -----------------

rw-r--r-- 1 root     root     2277376 Nov  6 14:01 filename.log.140108.gz
-rw-r--r-- 1 root     root     3862528 Nov  6 14:02 filename.log.140208.gz
-rw-r--r-- 1 root     root     5558272 Nov  6 14:03 filename.log.140308.gz
-rw-r--r-- 1 root     root     7147520 Nov  6 14:04 filename.log.140408.gz  
Gerald Schneider
  • 23,274
  • 8
  • 57
  • 89
Nakul
  • 1

1 Answers1

1

For compression, you can use an arbitrary compression tool, such as pigz (multithreaded gzip), pbzip2 (multithreaded bzip2) or xz -T0. The old and well-known .gz format is generated by pigz or gzip.

A few more tools and algorithms (some multithreaded, some single-threaded): zstd -T0, lrzip, lzop, lz4, lzip

However, your code snippet has an obvious data loss issue:

    ...
    cp $log_file $tmp_log_file
    # Here. This is a race window.
    # Whatever gets logged here will be lost!
    >$log_file
    ...

Attached is an outline showing how one could possibly address the atomicity (data loss) problem while also adding a standard .gz compression.

If the logger process is long-running and doesn't reopen the log file (on occasions other than a HUP signal), then you won't need the hard link trick (ln -f ...) and a mv followed by a kill -HUP will suffice. This approach would include a time window in which "$log_file" does not exist.

If the logger process can be restarted independently of the script's operation and/or if the log(s) can originate from multiple short-lived processes, then the atomic log file swap procedure shown below will be necessary to prevent data loss. This approach guarantees that (1) "$log_file" always exists and (2) all logs will be stored in exactly one of the per-minute log files.

#!/bin/bash

log_file='/home/tmp/filename.log'
old_log_file="/home/tmp/filename.log.$$.old"
new_log_file="/home/tmp/filename.log.$$.new"

for ((;;)); do
    sleep 60
    logger_PID="$(... find the PID of the logging process ...)"
    # Alternative to the following 3 lines: man renameat2
    ln -f "$log_file" "$old_log_file"  # keep writing the log
    touch "$new_log_file"              # prepare a new, empty log
    mv "$new_log_file" "$log_file"     # break the link atomically
    kill -HUP "$logger_PID"            # switch to the new log
    # At this point:
    # * $log_file is written.
    # * $old_log_file is stable.
    pigz < "$old_log_file" > "$log_file.$(date +%M%D%Y%H%M).gz"
    # This^^^ can be pbzip2 or any other compression tool.
done
  • As a side note, if you want _real_ per-minute logs (as opposed to "per-minute plus whatever the compression and the rest of the cycle takes"), you could use [an approach similar to this one](https://stackoverflow.com/a/19067658/8584929). All it takes is basically a simple modification to the `sleep` command. – Andrej Podzimek Nov 11 '20 at 05:38
  • Very nice and thorough answer! – ppuschmann Nov 11 '20 at 18:56