0

Running a CentOS 8 Stream server with nfs-utils version 2.3.3-57.el8 and using ansible-playbook core version 2.11.12 with a test playbook

- hosts: server-1
  tasks:
    - name: Collect status
      service_facts:

      register: services_state
    - name: Print service_facts
      debug:
        var: services_state

    - name: Collect systemd status
      ansible.builtin.systemd:
        name: "nfs-server"
      register: sysd_service_state

    - name: Print systemd state
      debug:
        var: sysd_service_state

will render the following results

service_facts

...
"nfs-server.service": {
                    "name": "nfs-server.service",
                    "source": "systemd",
                    "state": "stopped",
                    "status": "disabled"
},
...

ansible.builtin.systemd

...
"name": "nfs-server",
        "status": {
            "ActiveEnterTimestamp": "Tue 2022-10-04 10:03:17 UTC",
            "ActiveEnterTimestampMonotonic": "7550614760",
            "ActiveExitTimestamp": "Tue 2022-10-04 09:05:43 UTC",
            "ActiveExitTimestampMonotonic": "4096596618",
            "ActiveState": "active",
...

The NFS Server is very much running/active but the service_facts fails to report it as such.

Other services, such as httpd reports correct state in service_facts.

Have I misunderstood or done something wrong here? Or have I run in to an anomaly?

U880D
  • 8,601
  • 6
  • 24
  • 40
Enok82
  • 163
  • 1
  • 13
  • Since it could explain your observation, do you have [fact caching enabled](https://docs.ansible.com/ansible/latest/plugins/cache.html#enabling-fact-cache-plugins) in your environment and setup? If so, how it is configured? – U880D Oct 04 '22 at 15:31
  • 1
    I haven’t done any configuration at all, so I assume it’s set to the default. In a previous run I had the service enabled, which showed in service_facts so it’s not completely “stuck” – Enok82 Oct 04 '22 at 20:44
  • This question was flagged to "Close" with "A community-specific reason" and "Not about programming or software development", but currently and based on the current finding it seems to be about debugging the code of [`ansible/modules/service_facts.py`](https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/service_facts.py). So it is about software development then? – U880D Oct 05 '22 at 10:56
  • @U880D Then he needs to show the code here. [ask] All I see is systemd related things. – Rob Oct 06 '22 at 09:47
  • @Rob The question did start in Ansible, and whether I used the syntax correctly or if there was a problem in Ansible. The fact that my issue would end up being a problem in systemd was unclear to me at the time of posting. At what Stackexchange instance would you have me post it? – Enok82 Oct 07 '22 at 12:20
  • If your question relates to systemd, hover over the tag and read that popup – Rob Oct 07 '22 at 15:02

1 Answers1

1

Running RHEL 7.9, nfs-utils 1.3.0-0.68.el7, ansible 2.9.27, I was able to observe the same behavior and to reproduce the issue you are observing.

It seems to be caused by "two" redundant service files (or the symlink).

ll /usr/lib/systemd/system/nfs*
...
-rw-r--r--. 1 root root 1044 Oct  1  2022 /usr/lib/systemd/system/nfs-server.service
lrwxrwxrwx. 1 root root   18 Oct  1  2022 /usr/lib/systemd/system/nfs.service -> nfs-server.service
...

diff /usr/lib/systemd/system/nfs.service /usr/lib/systemd/system/nfs-server.service; echo $?
0

Obviously a status request call to systemd will produce the expected result.

systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/nfs-server.service.d
           └─order-with-mounts.conf
   Active: active (exited) since Sat 2022-10-01 22:00:00 CEST; 4 days ago
  Process: 1080 ExecStartPost=/bin/sh -c if systemctl -q is-active gssproxy; then systemctl reload gssproxy ; fi (code=exited, status=0/SUCCESS)
  Process: 1070 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS)
  Process: 1065 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 1070 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nfs-server.service

systemctl status nfs-server
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/nfs-server.service.d
           └─order-with-mounts.conf
   Active: active (exited) since Sat 2022-10-01 22:00:00 CEST; 4 days ago
  Process: 1080 ExecStartPost=/bin/sh -c if systemctl -q is-active gssproxy; then systemctl reload gssproxy ; fi (code=exited, status=0/SUCCESS)
  Process: 1070 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS)
  Process: 1065 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 1070 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nfs-server.service

However, a test playbook

---
- hosts: nfs_server
  become: true
  gather_facts: false

  tasks:

  - name: Gather Service Facts
    service_facts:

  - name: Show Facts
    debug:
      var: ansible_facts

called via

sshpass -p ${PASSWORD} ansible-playbook --user ${ACCOUNT} --ask-pass service_facts.yml | grep -A4 nfs

will result into an output of

PLAY [nfs_server] ***************

TASK [Gather Service Facts] *****
ok: [test.example.com]
--
      ...
      nfs-server.service:
        name: nfs-server.service
        source: systemd
        state: stopped
        status: enabled
      ...
      nfs.service:
        name: nfs.service
        source: systemd
        state: active
        status: enabled

and report for the first service file found(?), nfs.service correct only.

Workaround

You could just check for ansible_facts.nfs.service, the alias name.

systemctl show nfs-server.service -p Names
Names=nfs-server.service nfs.service

Further Investigation

  • Ansible Issue #73215 "ansible_facts.service returns incorrect state"
  • Ansible Issue #67262 "service_facts does not return correct state for services"

It might be that this is somehow caused by What does status "active (exited)" mean for a systemd service? and def _list_rh(self, services) and even if there is already set RemainAfterExit=yes within the .service files.

systemctl list-units --no-pager --type service --all | grep nfs
  nfs-config.service           loaded    inactive dead    Preprocess NFS configuration
  nfs-idmapd.service           loaded    active   running NFSv4 ID-name mapping service
  nfs-mountd.service           loaded    active   running NFS Mount Daemon
● nfs-secure-server.service    not-found inactive dead    nfs-secure-server.service
  nfs-server.service           loaded    active   exited  NFS server and services
  nfs-utils.service            loaded    inactive dead    NFS server and client services

For further tests

systemctl list-units --no-pager --type service --state=running
# versus
systemctl list-units --no-pager --type service --state=exited

one may also read about NFS server active (exited) and Service Active but (exited).

... please take note that I haven't done further investigation on the issue yet. Currently also for me it is still not fully clear which part in the code of ansible/modules/service_facts.py might causing this.

U880D
  • 8,601
  • 6
  • 24
  • 40
  • Wow, that’s some solid investigation! I did see the issue you linked but I think that case was a race condition resolved with a pause or an “until”. I’ll check that service file on Monday when I’m back at the office. – Enok82 Oct 05 '22 at 10:57