1

I am doing a live migration test in an OpenStack cluster. (Mitaka). I have 3 VMs deployed on 2 compute hosts. The nova-compute state is UP when I start the cluster. But after sometime the although the status is Enabled the state is down because of which my VMs get disrupted.

root@ctl:/var/log/nova# openstack compute service list
+----+------------------+-------------------------------------------------------+----------+---------+-------+----------------------------+
| Id | Binary           | Host                                                  | Zone     | Status  | State | Updated At                 |
+----+------------------+-------------------------------------------------------+----------+---------+-------+----------------------------+
|  1 | nova-cert        | ctl.livemigration.kkprojects-pg0.clemson.cloudlab.us  | internal | enabled | up    | 2023-02-18T21:17:06.000000 |
|  2 | nova-consoleauth | ctl.livemigration.kkprojects-pg0.clemson.cloudlab.us  | internal | enabled | up    | 2023-02-18T21:17:06.000000 |
|  3 | nova-scheduler   | ctl.livemigration.kkprojects-pg0.clemson.cloudlab.us  | internal | enabled | up    | 2023-02-18T21:17:09.000000 |
|  7 | nova-conductor   | ctl.livemigration.kkprojects-pg0.clemson.cloudlab.us  | internal | enabled | up    | 2023-02-18T21:17:08.000000 |
| 11 | nova-compute     | cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us | nova     | enabled | down  | 2023-02-18T20:14:16.000000 |
| 12 | nova-compute     | cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us | nova     | enabled | down  | 2023-02-18T20:14:21.000000 |
+----+------------------+-------------------------------------------------------+----------+---------+-------+----------------------------+

Checking the Nova Computes on the compute nodes i see they are enabled

root@cp-2:/etc/selinux# hostname
cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us
root@cp-2:/etc/selinux# service nova-compute status
● nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2023-02-18 14:20:22 EST; 1h 58min ago
 Main PID: 9926 (nova-compute)
   CGroup: /system.slice/nova-compute.service
           └─9926 /usr/bin/python /usr/bin/nova-compute --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf --log-file=/var/log/nova/nova-compute.log

Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[19697]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[19697]: pam_unix(sudo:session): session closed for user root
Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[19703]:     nova : TTY=unknown ; PWD=/var/lib/nova ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf o
Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[19703]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us ovs-vsctl[19706]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=120 -- --if-exists del-port qvo8e68056c-a0 -- ad
Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[19703]: pam_unix(sudo:session): session closed for user root
Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[19707]:     nova : TTY=unknown ; PWD=/var/lib/nova ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf i
Feb 18 14:59:07 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[19707]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 18 15:00:25 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[20242]:     nova : TTY=unknown ; PWD=/var/lib/nova ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf t
Feb 18 15:00:25 cp-2.livemigration.kkprojects-pg0.clemson.cloudlab.us sudo[20242]: pam_unix(sudo:session): session opened for user root by (uid=0)

root@cp-1:/var/log/nova# hostname
cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us
root@cp-1:/var/log/nova# service  nova-compute status
● nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2023-02-18 15:37:11 EST; 42min ago
  Process: 28530 ExecStartPre=/bin/chown nova:nova /var/lock/nova /var/log/nova /var/lib/nova (code=exited, status=0/SUCCESS)
  Process: 28527 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova /var/lib/nova (code=exited, status=0/SUCCESS)
 Main PID: 28533 (nova-compute)
   CGroup: /system.slice/nova-compute.service
           └─28533 /usr/bin/python /usr/bin/nova-compute --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf --log-file=/var/log/nova/nova-compute.log

Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.281 28533 DEBUG nova.compute.manager [req-df86d8a0-a7ff-480d-b157-39ae745850fd - - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.283 28533 DEBUG nova.compute.manager [req-df86d8a0-a7ff-480d-b157-39ae745850fd - - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.283 28533 DEBUG nova.compute.manager [req-df86d8a0-a7ff-480d-b157-39ae745850fd - - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.285 28533 DEBUG nova.compute.manager [req-df86d8a0-a7ff-480d-b157-39ae745850fd - - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.287 28533 DEBUG nova.virt.libvirt.vif [req-df86d8a0-a7ff-480d-b157-39ae745850fd - - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: s=<?>,shutdown_terminate=False,system_metadata=<?>,tags=<?>,task_state=None,terminated_at=None,updated_at
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.289 28533 DEBUG nova.compute.manager [req-df86d8a0-a7ff-480d-b157-39ae745850fd - - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.291 28533 DEBUG nova.compute.manager [req-df86d8a0-a7ff-480d-b157-39ae745850fd - - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.346 28533 WARNING nova.compute.monitors [req-df86d8a0-a7ff-480d-b157-39ae745850fd - -
Feb 18 15:37:15 cp-1.livemigration.kkprojects-pg0.clemson.cloudlab.us nova-compute[28533]: 2023-02-18 15:37:15.347 28533 INFO nova.compute.resource_tracker [req-df86d8a0-a7ff-480d-b157-39ae745850f
lines 1-19/19 (END)

Anvay
  • 13
  • 3
  • Mitaka is a very very old version, but apart from that, what's in the nova-compute.logs? Also check nova-scheduler and nova-conductor logs. If the services are reported as down it could be a rabbitmq issue as well. – eblock Feb 20 '23 at 11:36

1 Answers1

0

I'm going to agree with "eblock" about a rabbitmq issue or a general network issue. If nova-compute is enabled and running, but can't communicate reliably with the controllers, it will appear as "down". Sometimes restarting the nova-compute service will fix the problem, and won't affect any running VMs.

Check your nova-compute.log on your host and look for signs of timeouts, lost connections and the like. Also, look at your rabbitmq logs on your controllers. Hope that helps.

jimbob
  • 16