Problem
I am simply trying to install Cloudwatch Agent on Amazon Linux 2 instances at startup, using AWS userdata. For some reason, after Cloud Init has finished running, all services get restarted and the configuration file I put in the cloudwatch folder is not there anymore.
I am using a custom AMI which is pre-built with Packer, my configuration file being put in /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json
from an Ansible template. This is the configuration file I want to use, holding all metrics and logs I want to send. I am then copying it to /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
at startup after the agent installation.
Here is my userdata script:
#!/bin/bash
yum install amazon-cloudwatch-agent -y
cp /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
What is happening
After startup has finished, I can see the script ran correctly. If I run cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log
I can see that the following:
2021/07/16 13:33:46 I! I! Detected the instance is EC2
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:46 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 I! Detected runAsUser: root
2021/07/16 13:33:46 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to root:root
2021-07-16T13:33:46Z I! Starting AmazonCloudWatchAgent 1.247347.4
2021-07-16T13:33:46Z I! Loaded inputs: netstat diskio logfile mem net processes swap cpu disk
2021-07-16T13:33:46Z I! Loaded aggregators:
2021-07-16T13:33:46Z I! Loaded processors: delta ec2tagger
2021-07-16T13:33:46Z I! Loaded outputs: cloudwatch cloudwatchlogs
2021-07-16T13:33:46Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:46Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:46Z I! [logagent] starting
2021-07-16T13:33:46Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-07-16T13:33:46Z I! [logagent] found plugin logfile is a log collection
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:46Z I! cloudwatch: get unique roll up list [[AutoScalingGroupName] [InstanceId InstanceType] []]
2021-07-16T13:33:46Z I! cloudwatch: publish with ForceFlushInterval: 30s, Publish Jitter: 11s
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
=======> 2021-07-16T13:33:47Z I! [logagent] piping log from APP-DEV-php-errors-logs/XX.XX.X.XXX(/var/log/php-fpm/error.log) to cloudwatchlogs
2021-07-16T13:33:54Z I! Profiler is stopped during shutdown
2021-07-16T13:33:54Z I! [agent] Hang on, flushing any cached metrics before shutdown
2021/07/16 13:33:55 I! I! Detected the instance is EC2
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:55 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
2021/07/16 13:33:55 I! Detected runAsUser: cwagent
2021/07/16 13:33:55 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 994:992
2021/07/16 13:33:55 I! Set HOME: /home/cwagent
2021-07-16T13:33:55Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-07-16T13:33:55Z I! Loaded inputs: disk mem
2021-07-16T13:33:55Z I! Loaded aggregators:
2021-07-16T13:33:55Z I! Loaded processors: ec2tagger
2021-07-16T13:33:55Z I! Loaded outputs: cloudwatch
2021-07-16T13:33:55Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:55Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:55Z I! [logagent] starting
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:55Z I! cloudwatch: get unique roll up list []
2021-07-16T13:33:55Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 26s
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2021-07-16T13:39:07Z I! [processors.ec2tagger] ec2tagger: Refresh is no longer needed, stop refreshTicker.
So as you can see, the initial command from userdata runs fine and custom metrics and logs are collected (see the ====> mark before the relevant lines).
However a few seconds later, after Cloud Init is over, the cloudwatch agent is restarted by systemd somehow and again, somehow, the file amazon-cloudwatch-agent.json
is absent from the filesystem, so the agent runs with default parameters.
However if I rerun the command manually after startup everything works fine but of course I need it automated for when autoscaling fires up.
What I have tried
Launching amazon cloudwatch agent directly with systemd, trying to chown the config file to read-only, fetching config only and let the system start the agent itself, but the problem still persists.
Thank you for your help