I am trying to setup a prometheus podman container using ansible.
Question
I'm trying to set up a Prometheus container using Podman and Ansible on an Ubuntu machine. The container runs fine manually, but when I attempt to automate it using Ansible, the systemd service doesn't seem to auto-start as expected. I'm looking for help to debug this issue.
I'm working on an Ubuntu machine.
me@ubuntu-usb:~/undergit/td2023-monitoring$ uname -a
Linux ubuntu-usb 6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
me@ubuntu-usb:~/undergit/td2023-monitoring$ free -h
total utilisé libre partagé tamp/cache disponible
Mem: 15Gi 8.1Gi 1.8Gi 1.6Gi 5.5Gi 5.2Gi
Échange: 2.0Gi 0B 2.0Gi
me@ubuntu-usb:~/undergit/td2023-monitoring$
Reproduce error
Here is the command I'm using:
I'm using Terraform to launch my playbook since my tasks and handlers are in a role.
me@ubuntu-usb:~/undergit/td2023-monitoring$ terraform -chdir=tf apply -target=module.lxd_machine_cernovada
Since I'm using podman I want my tasks to be rootless, that's why I'm using a become false.
- name: Installation/configuration of monitoring server
hosts: domain_monitor
become: false
roles:
- da.prometheus
And this is the playbook/role I created:
me@ubuntu-usb:~/undergit/td2023-monitoring$ cat roles/da.prometheus/tasks/main.yml
---
- name: Ensure data directory exists
file:
path: "/home/admin/data"
state: directory
mode: '0755' # donne rwx au propriétaire, rx aux groups et rx à all
- name: Copy prometheus.yml to the data directory
copy:
src: prometheus.yml
dest: /home/admin/data/prometheus.yml
mode: '0644' # donne rw au propriétaire, r aux groups et r à all
owner: admin
group: admin
- name: Run Prometheus container
containers.podman.podman_container:
name: prometheus
image: docker.io/prom/prometheus:v2.46.0
ports:
- "9090:9090"
volumes:
- /home/admin/data:/etc/prometheus
state: started
detach: yes
recreate: no
- name: Ensure systemd user directory exists
file:
path: "/home/admin/.config/systemd/user"
state: directory
mode: '0755'
- name: Copy prometheus.service into the user systemd directory
copy:
src: "prometheus.service"
dest: "/home/admin/.config/systemd/user/prometheus.service"
mode: '0644'
notify:
- reload systemd
- name: Enable Prometheus user service
systemd:
name: prometheus.service
enabled: yes
state: started
scope: user
notify:
- restart prometheus
me@ubuntu-usb:~/undergit/td2023-monitoring$
With those handlers :
me@ubuntu-usb:~/undergit/td2023-monitoring$ cat roles/da.prometheus/handlers/main.yml
---
- name: restart prometheus
systemd:
name: prometheus
state: restarted
scope: user
- name: reload systemd
systemd:
daemon_reload: yes
me@ubuntu-usb:~/undergit/td2023-monitoring$
Since I'm using rootless podman I want my service to be scope: user
From Ansible systemd doc : https://docs.ansible.com/ansible/latest/collections/ansible/builtin/systemd_service_module.html#parameter-scope
Here is the very basic service I created for this task.
me@ubuntu-usb:~/undergit/td2023-monitoring$ cat roles/da.prometheus/files/prometheus.service
[Unit]
Description=Prometheus Service
[Service]
ExecStart=/usr/bin/podman start -a prometheus
ExecStop=/usr/bin/podman stop -t 2 prometheus
Restart=always
[Install]
WantedBy=multi-user.target
me@ubuntu-usb:~/undergit/td2023-monitoring$
Error I'm gettin
│ RUNNING HANDLER [da.prometheus : reload systemd] *******************************
│ lundi 28 août 2023 11:30:13 +0200 (0:00:01.996) 0:01:17.395 ************
│ fatal: [cernovada.antiterre.lan]: FAILED! => {
│ "changed": false
│ }
│
│ MSG:
│
│ failure 1 during daemon-reload: Failed to reload daemon: Method call timed out
│
│
│ NO MORE HOSTS LEFT *************************************************************
│
│ PLAY RECAP *********************************************************************
│ cernovada.antiterre.lan : ok=11 changed=9 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
│
│ Playbook run took 0 days, 0 hours, 1 minutes, 44 seconds
│ lundi 28 août 2023 11:30:40 +0200 (0:00:26.924) 0:01:44.319 ************
│ ===============================================================================
│ da.prometheus : Run Prometheus container ------------------------------- 28.92s
│ Installation des paquets ----------------------------------------------- 28.42s
│ da.prometheus : reload systemd ----------------------------------------- 26.92s
│ da.prometheus : Copy prometheus.yml to the data directory --------------- 3.05s
│ da.prometheus : Copy prometheus.service into the user systemd directory --- 2.83s
│ Gathering Facts --------------------------------------------------------- 2.78s
│ Stop service cron on debian, if running --------------------------------- 2.36s
│ Gathering Facts --------------------------------------------------------- 2.01s
│ da.prometheus : restart prometheus -------------------------------------- 2.00s
│ da.prometheus : Enable Prometheus user service -------------------------- 1.87s
│ da.prometheus : Ensure data directory exists ---------------------------- 1.58s
│ da.prometheus : Ensure systemd user directory exists -------------------- 1.56s
│
I tried ssh into the container to check the service status and it says ``:
If I ssh into the container and start the service mannually it works fine ?!
admin@cernovada:~$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
admin@cernovada:~$ systemctl --user start prometheus.service
admin@cernovada:~$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
16f96b483a18 docker.io/prom/prometheus:v2.46.0 --config.file=/et... 14 minutes ago Up 1 second ago 0.0.0.0:9090->9090/tcp prometheus
admin@cernovada:~$
So my guess is the service is working fine, only when using the Ansible playbook my service won't auto-start.
Goal
I want the prometheus server to auto-start when running the playbook. That's why I created a systemd service.
I'm open to others solutions to auto-start the container, although it has to always restart if the host restart.
What I tried
I tried adding a - meta: flush_handlers
after the reload systemd
as seen here: https://lookonmyworks.co.uk/2015/06/24/ansible-systemctl-daemon-reload/
I tried removing the notify: reload systemd
but I get the same error.
I did make all the steps manually without using Ansible to ensure it wasn't from my config files and I get no errors.
If I try to access the prometheus interface while the playbook is running, I get the web UI and when ansible try reload the systemd daemon I can't access it anymore.
meta
If I forgot to add informations, ask and I'll provide asap.
If the question is too long or unreadable, thanks in advance for the edits.
Thank you for you kind help