7

Problem

I am simply trying to install Cloudwatch Agent on Amazon Linux 2 instances at startup, using AWS userdata. For some reason, after Cloud Init has finished running, all services get restarted and the configuration file I put in the cloudwatch folder is not there anymore.

I am using a custom AMI which is pre-built with Packer, my configuration file being put in /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json from an Ansible template. This is the configuration file I want to use, holding all metrics and logs I want to send. I am then copying it to /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json at startup after the agent installation.

Here is my userdata script:

#!/bin/bash
yum install amazon-cloudwatch-agent -y
cp /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json

What is happening

After startup has finished, I can see the script ran correctly. If I run cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log I can see that the following:

2021/07/16 13:33:46 I! I! Detected the instance is EC2
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
Configuration validation first phase succeeded

2021/07/16 13:33:46 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 I! Detected runAsUser: root
2021/07/16 13:33:46 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to root:root
2021-07-16T13:33:46Z I! Starting AmazonCloudWatchAgent 1.247347.4
2021-07-16T13:33:46Z I! Loaded inputs: netstat diskio logfile mem net processes swap cpu disk
2021-07-16T13:33:46Z I! Loaded aggregators:
2021-07-16T13:33:46Z I! Loaded processors: delta ec2tagger
2021-07-16T13:33:46Z I! Loaded outputs: cloudwatch cloudwatchlogs
2021-07-16T13:33:46Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:46Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:46Z I! [logagent] starting
2021-07-16T13:33:46Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-07-16T13:33:46Z I! [logagent] found plugin logfile is a log collection
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:46Z I! cloudwatch: get unique roll up list [[AutoScalingGroupName] [InstanceId InstanceType] []]
2021-07-16T13:33:46Z I! cloudwatch: publish with ForceFlushInterval: 30s, Publish Jitter: 11s
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
=======> 2021-07-16T13:33:47Z I! [logagent] piping log from APP-DEV-php-errors-logs/XX.XX.X.XXX(/var/log/php-fpm/error.log) to cloudwatchlogs
2021-07-16T13:33:54Z I! Profiler is stopped during shutdown
2021-07-16T13:33:54Z I! [agent] Hang on, flushing any cached metrics before shutdown
2021/07/16 13:33:55 I! I! Detected the instance is EC2
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded

2021/07/16 13:33:55 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
2021/07/16 13:33:55 I! Detected runAsUser: cwagent
2021/07/16 13:33:55 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 994:992
2021/07/16 13:33:55 I! Set HOME: /home/cwagent
2021-07-16T13:33:55Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-07-16T13:33:55Z I! Loaded inputs: disk mem
2021-07-16T13:33:55Z I! Loaded aggregators:
2021-07-16T13:33:55Z I! Loaded processors: ec2tagger
2021-07-16T13:33:55Z I! Loaded outputs: cloudwatch
2021-07-16T13:33:55Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:55Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:55Z I! [logagent] starting
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:55Z I! cloudwatch: get unique roll up list []
2021-07-16T13:33:55Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 26s
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2021-07-16T13:39:07Z I! [processors.ec2tagger] ec2tagger: Refresh is no longer needed, stop refreshTicker.

So as you can see, the initial command from userdata runs fine and custom metrics and logs are collected (see the ====> mark before the relevant lines). However a few seconds later, after Cloud Init is over, the cloudwatch agent is restarted by systemd somehow and again, somehow, the file amazon-cloudwatch-agent.json is absent from the filesystem, so the agent runs with default parameters.

However if I rerun the command manually after startup everything works fine but of course I need it automated for when autoscaling fires up.

What I have tried

Launching amazon cloudwatch agent directly with systemd, trying to chown the config file to read-only, fetching config only and let the system start the agent itself, but the problem still persists.

Thank you for your help

Daniel
  • 807
  • 8
  • 24

2 Answers2

6

Workaround

The preinstalled ssm-agent conflicts with the Cloudwtach Agent. Uninstall ssm-agent during Packer build:

sudo yum erase amazon-ssm-agent --assumeyes

Explanation

I finally found out that the newly install cloudwatch agent conflicts with the SSM agent installed by default in the Amazon Linux 2 image. Indeed, I first tried an ugly workaround which would be to replace the StartExec line of the amazon-cloudwatch-agent service using sed in the user data :

sed -i '/ExecStart/c\ExecStart=/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json' /etc/systemd/system/amazon-cloudwatch-agent.service

That way when the service gets restarted after instance startup it would use my custom configuration. However I then found out that the service file got also replaced after Cloud Init ended.

Reviewing the system messages I noticed that ssm-agent was performing some configuration reloading after Cloud Init ended, and thus I assumed that it could possibly be the culprit. I ended up uninstalling it in the packer build which is building my AMI so it would not be present at instance startup, and finally my configuration did not get overwritten anymore.

Note that I do not have a deep understanding of how ssm-agent works, and there is probably a proper way to instantiate Cloudwatch Agent using some SSM configuration. Since we do not currently use SSM and I do not have enough time to study this option, I choosed this compromise.

If someone can come up with a cleaner solution, using ssm-agent through an automated method, this would be greatly appreciated.

Daniel
  • 807
  • 8
  • 24
  • Thanks for outlining this. Such a pain, especially since the docs at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html don't make any mention of it! I'm using SSM for Session Manager, but based on https://www.reddit.com/r/aws/comments/guduj5/amazon_cloudwatch_agent_config_deleted_on_boot it sounds like the solution is to run 'RunCommand AWS-ConfigureCloudWatch with your JSON CW config blob as a parameter'. – Tim Malone Apr 07 '22 at 07:27
  • Of course, ^ that is not really feasible if you don’t want to have to run RunCommand on new instances all the time… – Tim Malone Apr 07 '22 at 08:10
  • See this bug: https://github.com/aws/amazon-cloudwatch-agent/issues/594 – Vijay Nandwana Oct 10 '22 at 07:46
0

You can try using AWS Systems Manager Parameter Store - the SSM agent shouldn't/can't remove the config there.

  1. Ensure the server has AWS-managed policy CloudWatchAgentServerPolicy attached, this allows any Parameter Store parameter named AmazonCloudWatch-* to be read
  2. Ensure the contents of what would have been in amazon-cloudwatch-agent.json is stored in Parameter Store as valid json, with an appropriate name from (1)
  3. As found in cat /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl, run amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:AmazonCloudWatch-Config.json -s (replacing AmazonCloudWatch-Config.json with the name of the parameter from Parameter Store)

See aws docs.

Update 2022-10-02 - multiple configs can be added by using append-config, see aws docs, or by placing config files into /etc/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.d/ and restarting the agent sudo service amazon-cloudwatch-agent restart. This helps when attempting to deploy on Amazon Linux 2, which ships its application logs using a cloudwatch log config from this folder, as well as a custom config to ship other logs from the server. Using fetch-config would otherwise overwrite the log config for the AL2 application.

Steve
  • 11
  • 2