0

Okay, someone has already solved this, so I thought I'd ask here.

I am running Prometheus and Grafana inside an AWS Fargate container. In order to achieve persistence of the collected data across upgrades, I use an EFS file system (it's just an NFS mount point) for /var/lib/prometheus.

When it's time to upgrade my container, AWS brings up the second container, waits for health checks to pass, then drops the first one.

The problem is that both instances are pointing to the same data directory (NFS), and there's a lock file there. As designed.

Do people have solutions?

I've thought about two choices:

  1. Upgrade differently. Stop the old one before starting the new one.
  2. Change the start script in the container to wait for the old lock to disappear, then start Prometheus. I'd also update my health check to work against something else.

I don't know if I'm guaranteed for the lock file to disappear, but I suspect Prometheus gets upset if he starts and it's still there, even if the other container is gone.

People have already solved this, if not for Prometheus, then for other servers that store data in a volume. Suggestions?

Joseph Larson
  • 8,530
  • 1
  • 19
  • 36
  • Change `maximumPercent` in the deployment configuration for the service to 100%, and the existing task should be stopped before it starts the new one. – jordanm Mar 12 '21 at 16:46
  • FYI I assumed that you were talking about ECS, but "AWS Fargate" is now ambiguous because it could refer to either EKS or ECS. – jordanm Mar 12 '21 at 16:47

0 Answers0