0

I have deployed a multinode deployment of OpenStack using kolla ansible (deployed following the openstack deployment guide) on 2 rocky linux 9.1 machines. When attempting to migrate one instance between nodes, it fails and the instance enters the error state. I get the following error in the logs:

2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11467, in migrate_disk_and_power_off
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     self._cleanup_remote_migration(dest, inst_base,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     raise self.value
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11445, in migrate_disk_and_power_off
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     libvirt_utils.copy_image(from_path, img_path, host=dest,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/utils.py", line 243, in copy_image
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     remote_filesystem_driver.copy_file(src, dest,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/volume/remotefs.py", line 104, in copy_file
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     self.driver.copy_file(src, dst, on_execute=on_execute,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/volume/remotefs.py", line 196, in copy_file
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     processutils.execute(
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/oslo_concurrency/processutils.py", line 438, in execute
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     raise ProcessExecutionError(exit_code=_returncode,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Command: scp -r /var/lib/nova/instances/d043223f-ab70-4f55-bd2b-897768681094_resize/disk 10.0.102.1:/var/lib/nova/instances/d043223f-ab70-4f55-bd2b-897768681094/disk
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Exit code: 255
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Stdout: ''
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Stderr: "Warning: Permanently added '[10.0.102.1]:8022' (ED25519) to the list of known hosts.\r\nsubsystem request failed on channel 0\r\nConnection closed\r\n"
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server

My hypothesis is that there is some mismatch between server and client somewhere, with one using legacy scp and the other using sftp, however I'm not sure how to correct this.

N. Komodo
  • 1
  • 1
  • What errors (if any) do you see from `sshd`? – larsks Feb 02 '23 at 01:20
  • 1
    @larsks looks like the port 8022 mentioned in logs is controlled by the nova-ssh docker container, which helpfully doesnt seem to keep ssh logs – N. Komodo Feb 02 '23 at 02:58
  • The hypervisors require passwordless ssh access to live-migrate instances. – eblock Feb 03 '23 at 09:17
  • @eblock Live migration works fine. As the title says, the issue is with cold migration. – N. Komodo Feb 03 '23 at 17:18
  • It was not clear that live migration works. What happens if you try to scp (maybe a test file within that directory or an actual instance) between the nodes manually? – eblock Feb 03 '23 at 21:31
  • As nova user, of course. – eblock Feb 03 '23 at 21:39
  • @eblock fails with the same error when just running the command, when i add -O to run it in legacy mode the transfer works fine – N. Komodo Feb 04 '23 at 01:56
  • Interesting, seems like this explains it a bit: https://www.redhat.com/en/blog/openssh-scp-deprecation-rhel-9-what-you-need-know. Not sure if there’s an option in nova.conf to add a legacy option, will check tomorrow. – eblock Feb 04 '23 at 09:14

1 Answers1

0

Appending

Subsystem       sftp    /usr/libexec/openssh/sftp-server

to /etc/kolla/nova-ssh/sshd_config and restarting the nova_ssh container solves this.

N. Komodo
  • 1
  • 1