0

OpenStack migration is failing between two hosts. Using OpenStack Ussuri. Both hosts have VMs running on them and are able to host new VMs.

Both hosts appear up and available in the compute service list:

darren@jacob:admin:~$ openstack compute service list
+--------------------------------------+----------------+--------+----------+---------+-------+--    --------------------------+
| ID                                   | Binary         | Host   | Zone     | Status  | State | Updated At                 |
+--------------------------------------+----------------+--------+----------+---------+-------+----------------------------+
| 65640c54-641f-4cbf-91ba-dac39764ac31 | nova-scheduler | jacob  | internal | enabled | up    | 2021-01-15T23:57:22.000000 |
| 0aa0b80b-09e6-4e61-b222-dbf62b43ddda | nova-conductor | jacob  | internal | enabled | up    | 2021-01-15T23:57:26.000000 |
| f4dce946-94cf-482a-83d2-b32f1c7f87b5 | nova-compute   | joseph | nova     | enabled | up    | 2021-01-15T23:57:19.000000 |
| 2b149fe0-9b9b-44b8-8d70-9fa5cf3b968b | nova-compute   | judah  | nova     | enabled | up    | 2021-01-15T23:57:27.000000 |
+--------------------------------------+----------------+--------+----------+---------+-------+----------------------------+

Here is an exerpt of the error from the controller /var/log/nova/nova-conductor.log:

2021-01-15 15:39:43.263 30830 ERROR nova.conductor.tasks.migrate [req-a62d9ff4-be8b-4870-81a4-ebaf1c85ce37 993afae9dd9746b48f72fcafd974aef7 e98eaaf8e7ff403cb1180e9e29148890 - default default] [instance: 001f9cad-25ca-4f2d-b32c-01953d854dc5] Unable to find record for source node joseph.mcgrandle.com on joseph: nova.exception.ComputeHostNotFound: Compute host joseph could not be found.
2021-01-15 15:39:43.263 30830 WARNING nova.scheduler.utils [req-a62d9ff4-be8b-4870-81a4-ebaf1c85ce37 993afae9dd9746b48f72fcafd974aef7 e98eaaf8e7ff403cb1180e9e29148890 - default default] Failed to compute_task_migrate_server: Compute host joseph could not be found.: nova.exception.ComputeHostNotFound: Compute host joseph could not be found.
2021-01-15 15:39:43.264 30830 WARNING nova.scheduler.utils [req-a62d9ff4-be8b-4870-81a4-ebaf1c85ce37 993afae9dd9746b48f72fcafd974aef7 e98eaaf8e7ff403cb1180e9e29148890 - default default] [instance: 001f9cad-25ca-4f2d-b32c-01953d854dc5] Setting instance to ACTIVE state.: nova.exception.ComputeHostNotFound: Compute host joseph could not be found.
2021-01-15 15:39:43.318 30830 ERROR oslo_messaging.rpc.server [req-a62d9ff4-be8b-4870-81a4-ebaf1c85ce37 993afae9dd9746b48f72fcafd974aef7 e98eaaf8e7ff403cb1180e9e29148890 - default default] Exception during message handling: nova.exception.ComputeHostNotFound: Compute host joseph could not be found.

I've tried re-populating the nova database with # su -s /bin/sh -c "nova-manage db sync" nova And also trying to re-discover compute hosts: # su -s /bin/sh -c "nova-manage cell_v2 discover_hosts --verbose" nova

But nothing appears to be making any difference. Thanks for any pointers or help.

Update: here is the output of the requested commands:

darren@jacob:admin:~$ sudo nova-manage cell_v2 list_hosts
+-----------+--------------------------------------+----------+
| Cell Name |              Cell UUID               | Hostname |
+-----------+--------------------------------------+----------+
|   cell1   | 9095885b-466f-41d4-9c85-45b5af7b5ce2 |  joseph  |
|   cell1   | 9095885b-466f-41d4-9c85-45b5af7b5ce2 |  judah   |
|   cell1   | 9095885b-466f-41d4-9c85-45b5af7b5ce2 |  reuben  |
+-----------+--------------------------------------+----------+
darren@jacob:admin:~$ sudo nova-manage cell_v2 list_cells
+-------+--------------------------------------+-------------------------------------+--------------------------------------------+----------+
|  Name |                 UUID                 |            Transport URL            |            Database Connection             | Disabled |
+-------+--------------------------------------+-------------------------------------+--------------------------------------------+----------+
| cell0 | 00000000-0000-0000-0000-000000000000 |                none:/               | mysql+pymysql://nova:****@jacob/nova_cell0 |  False   |
| cell1 | 9095885b-466f-41d4-9c85-45b5af7b5ce2 | rabbit://openstack:****@jacob:5672/ |    mysql+pymysql://nova:****@jacob/nova    |  False   |
+-------+--------------------------------------+-------------------------------------+--------------------------------------------+----------+

and here is the updated compute service list output after adding reuben:

darren@jacob:admin:~$ openstack compute service list
+--------------------------------------+----------------+--------+----------+---------+-------+----------------------------+
| ID                                   | Binary         | Host   | Zone     | Status  | State | Updated At                 |
+--------------------------------------+----------------+--------+----------+---------+-------+----------------------------+
| 65640c54-641f-4cbf-91ba-dac39764ac31 | nova-scheduler | jacob  | internal | enabled | up    | 2021-01-26T08:04:07.000000 |
| 0aa0b80b-09e6-4e61-b222-dbf62b43ddda | nova-conductor | jacob  | internal | enabled | up    | 2021-01-26T08:04:08.000000 |
| f4dce946-94cf-482a-83d2-b32f1c7f87b5 | nova-compute   | joseph | nova     | enabled | up    | 2021-01-26T08:04:09.000000 |
| 2b149fe0-9b9b-44b8-8d70-9fa5cf3b968b | nova-compute   | judah  | nova     | enabled | up    | 2021-01-26T08:04:08.000000 |
| d306fe4f-1d12-41b7-a2c9-8f856247268b | nova-compute   | reuben | nova     | enabled | up    | 2021-01-26T08:04:15.000000 |
+--------------------------------------+----------------+--------+----------+---------+-------+----------------------------+
dmcgrandle
  • 121
  • 7
  • Can you share `nova-manage cell_v2 list_hosts` and `nova-manage cell_v2 list_cells`? Please mask all sensitive information and add that output to the question. – eblock Jan 21 '21 at 07:08
  • @eblock - Thanks for the reply. Output added above. – dmcgrandle Jan 22 '21 at 21:33
  • Why is "reuben" not in the `compute service list` output? What command do you issue to migrate an instance? – eblock Jan 25 '21 at 07:23
  • Sorry - `reuben` was added between the time the two commands were run - I brought up a brand new host to see if error persisted. It did. Updated `compute service list` has been added to the question above. I have tried both live migration and not, both fail. Note - both were working for some time before they started to fail. `openstack server migrate --live-migration --host judah 001f9cad-25ca-4f2d-b32c-01953d854dc5` and `openstack server migrate --host judah.mcgrandle.com 001f9cad-25ca-4f2d-b32c-01953d854dc5` are two examples (both run with admin account credentials). – dmcgrandle Jan 26 '21 at 08:08
  • Well, so the main question is what between "it worked" and "it stopped working". Did you update any of the nodes? Check other log files on both control and compute nodes (nova, neutron, rabbitmq etc.) to find more hints. – eblock Jan 26 '21 at 08:15
  • Yes, that is the question - "what changed". Yes, I did update the nodes and the controller a number of times. Finding out what specific change caused this will be hard. I've been culling logs and trying to track down things that way from the start, will continue to do so. Any thoughts on where to look specifically would be helpful. Happy to provide any further info to shed light on this. – dmcgrandle Jan 27 '21 at 05:39
  • Did you save a copy of the previous config files so you could just compare them? A more detailed description of what was updated and how could also help. Did you start with Ussuri or did you upgrade to Ussuri? Did you encounter any issues/errors during the upgrade process? I'm not sure if would make sense to try to rollback, maybe trying to figure out the root cause now is the better way. Have you tried it with debug mode and also check the nova-compute logs? You could also try to run the migrate command with debug on, maybe that also reveals something. – eblock Jan 27 '21 at 07:52

1 Answers1

0

I just encounter the same behavior on a Ussuri plateform . But it's only on few instances What i had notice, theses instances aren't have the tag (N/A (booted from volume) on the image filed (when you do a openstack server show) In our case, all our instances are configured with an boot from vol (not ephemeral) And these instances seems to be tag as an ephemeral ,. I tried to do an --block-migration but it dont works

Are you in the same case ?

Regards

  • This does not provide an answer to the question. Once you have sufficient [reputation](https://serverfault.com/help/whats-reputation) you will be able to [comment on any post](https://serverfault.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/524944) – Dave M Jul 12 '22 at 11:48
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 12 '22 at 16:29