Prometheus instance cycling on AWS

Question

Okay, someone has already solved this, so I thought I'd ask here.

I am running Prometheus and Grafana inside an AWS Fargate container. In order to achieve persistence of the collected data across upgrades, I use an EFS file system (it's just an NFS mount point) for /var/lib/prometheus.

When it's time to upgrade my container, AWS brings up the second container, waits for health checks to pass, then drops the first one.

The problem is that both instances are pointing to the same data directory (NFS), and there's a lock file there. As designed.

Do people have solutions?

I've thought about two choices:

Upgrade differently. Stop the old one before starting the new one.
Change the start script in the container to wait for the old lock to disappear, then start Prometheus. I'd also update my health check to work against something else.

I don't know if I'm guaranteed for the lock file to disappear, but I suspect Prometheus gets upset if he starts and it's still there, even if the other container is gone.

People have already solved this, if not for Prometheus, then for other servers that store data in a volume. Suggestions?

Change `maximumPercent` in the deployment configuration for the service to 100%, and the existing task should be stopped before it starts the new one. — jordanm, Mar 12 '21 at 16:46
FYI I assumed that you were talking about ECS, but "AWS Fargate" is now ambiguous because it could refer to either EKS or ECS. — jordanm, Mar 12 '21 at 16:47

Prometheus instance cycling on AWS

0 Answers0