Is corruption possible by moving Nginx logs off of server daily?

Question

After Nginx has been running for a while, the files in /var/log/nginx are:

Dec 17 access.log.1
Dec 16 access.log.2.gz
Dec 15 access.log.3.gz
Dec    ..
Dec  5 access.log.13.gz
Dec  4 access.log.14.gz

The files cycle. Each day access.log.14.gz is purged, the files are rotated, and a fresh access.log.1 is created.

Suppose I move daily the file access.log.2.gz from the server's disk to an otherwise-idle machine's disk? Is it sufficient to merely run a cron job daily to perform the move? Is there a scenario when a file or a part of a file would be corrupted or lost by running a cron job?

Update I am (now) aware that, as with many things in cloud-based development, I can solve a current and ongoing problem by simply pushing a button. But I'd like to understand a little bit more what is happening, even if at the end I will actually push the button.

So let me repeat the question. The fact that my server is indeed an AWS one is an orthogonal issue, and so for the purpose of this question: suppose that my server hardware is owned and managed by myself. Now suppose that I have a cron job running once a day on the server. The job moves, daily, access.log.2.gz to an alternative storage away from the server—both to ensure that I do not lose that file when Nginx wraps around and to avoid flooding whatever disk I have on the server with logs.

Is there a scenario when a file or part of a file would be corrupted by running a cron job?

score 0 · Answer 1 · answered Dec 18 '19 at 03:20

You may like to explore AWS cloudwatch agent to store your logs in cloudwatch.

CloudWatch includes a new unified agent that can collect both logs and metrics from EC2 instances and on-premises servers. If you are not already using the older CloudWatch Logs agent, we recommend that you use the newer unified CloudWatch agent.

If you wish to store a copy on S3, you can easily configure AWS Cloudwatch log group to copy logs into S3

Source : Quick Start: Install and Configure the CloudWatch Logs Agent on a Running EC2 Linux Instance

You can export log data from your log groups to an Amazon S3 bucket and use this data in custom processing and analysis, or to load onto other systems.

Exporting Log Data to Amazon S3

score 0 · Answer 2 · answered Dec 18 '19 at 04:22

I do not think you need to introduce Cloudwatch agent -> cloud watch -> lambda -> s3 plus you can also expect some huge cost if the data is enough as my experience with CW it can make most cost then ec2 server if there is too much logs data.

They both have similar storage costs, but CloudWatch Logs has an additional ingest charge.

Therefore, it would be lower cost to send straight to Amazon S3.

Cloudwatch log store costing vs S3 costing

So you configure a cron job that use AWS cli and push logs to s3, in the script you can add a log if failed to upload send a notification to slack or keep the file. as you think of data corruption, you also need to handle this on lambda as well which you can do on bash as well.

upload to s3

if upload failed then
  send notification 
  mv current_file to file.backup
fi

score 0 · Answer 3 · answered Dec 18 '19 at 08:00

From my EXP based on some projects, CWA is not good solution this case.

Fluentd is much better than CWA and CW logs.

If you push logs to CW logs, you have to pay for ingres cost, storage cost, Lambda execution cost when you want to move logs from CW Logs to s3.

Fluentd is open source. You can process logs, push logs files to S3 directly from EC2 instance.

score 0 · Answer 4 · answered Dec 31 '19 at 21:32

This highly depends on how you do the move, and how the original script does the rotation as well, including the possibility of the race conditions between the two.

Per http://nginx.org/docs/control.html#logs, nginx should be sent a USR1 signal to re-open the log-files, which makes them available for post-processing "almost immediately", according to documentation. Leaving one extra rotation cycle before you do that might still be the safest approach (notice how you're currently already doing that, as access.log.1 in your output isn't achieved yet, unlike access.log.2.gz and older).

You could also have nginx automatically use gzip as it goes, this way you'll never have to archive the files yourself manually from within cron, or worry about the extra data-loss, saving an extra step; see the gzip parameter on http://nginx.org/r/access_log.

Finally, if you intend on keeping your log files long-term, it might make more sense to implement rotation in your own shell scripts, instead of using any sort of a logrotate wrapper or newsyslog. For example, see What's the easiest way to rotate nginx log files monthly?; it's really a very simple process, and if you write the whole thing yourself in clear sequential and synchronous steps, there's less of a possibility for a data loss as well. It'll also let you avoid things like https://serverfault.com/questions/480551/logrotate-not-rotating-file-after-file-size-exceeds-the-limit/480556#480556.

I would still recommend to keep a 1-cycle delay between rotation and moving-off, but I would write my own script without the overhead and limitations of logrotate or newsyslog.

Is corruption possible by moving Nginx logs off of server daily?

4 Answers4