0

My VM backup script fails while creating the snapshot.

virsh snapshot-create-as --domain machine_1 snap --diskspec vda,file=/srv/test/test-snap.qcow2 --disk-only --atomic --no-metadata --quiesce
error: Requested operation is not valid: domain is already quiesced

Even after a VM reboot, the system is still quiesced and I get the same error.

I thought quiesce means FS freeze, but this makes no sense since I can still write to the FS when logged in the faulty VMs. And this would not survive a reboot, right?

Could it be a communication issue that makes the host think the GA says the machine is quiesced while it is not?

In any case, is there a command to enquire the quiesce state (apart from attempting a snapshot and see if I get an error)?

Assuming the faulty VMs went quiesced after a unreproducible error, I could fix that by exiting quiesced state, whatever that means. Is there a virsh command to unquiesce the VM?

The whole backup procedure used to work and now it fails on 2 VMs but still works on 2 others and I can't think of any relevant difference between them.

Software versions:

  • Host is Debian Jessie with qemu-kvm 2.8+dfsg-3~bop8+1 from backports.
  • Guests are Debian Stretch with qemu-guest-agent 2.8+dfsg-6+deb9u4.

(For the record, the backup script is here on GitHub. Basically, what it does is 1/ create snapshot, 2/ copy, 3/ commit snapshot.)

If I remove the quiesce option from the snapshot command line, things work smooth. But obviously, this is not ideal.

Jérôme
  • 615
  • 2
  • 8
  • 19

1 Answers1

0

The root cause is a bug that was fixed in libvirt 1.2.11.

Fixed upstream by:

commit 6085d917d5c5839b7ed351e99fadbbb56f5178fe
Author: Michal Privoznik <mprivozn@redhat.com>
Date: Thu Nov 27 11:43:56 2014 +0100

qemu: Don't track quiesced state of FSs

https://bugzilla.redhat.com/show_bug.cgi?id=1160084

As of b6d4dad11b (1.2.5) we are trying to keep the status of FSFreeze
in the guest. Even though I've tried to fixed couple of corner cases
(6ea54769ba18), it occurred to me just recently, that the approach is
broken by design. Firstly, there are many other ways to talk to
qemu-ga (even through libvirt) that filesystems can be thawed (e.g.
qemu-agent-command) without libvirt noticing. Moreover, there are
plenty of ways to thaw filesystems without even qemu-ga noticing (yes,
qemu-ga keeps internal track of FSFreeze status). So, instead of
keeping the track ourselves, or asking qemu-ga for stale state, it's
the best to let qemu-ga deal with that (and possibly let guest kernel
propagate an error).

Moreover, there's one bug with the following approach, if fsfreeze
command failed, we've executed fsthaw subsequently. So issuing
domfsfreeze in virsh gave the following result:

virsh # domfsfreeze gentoo
Froze 1 filesystem(s)

virsh # domfsfreeze gentoo
error: Unable to freeze filesystems
error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance

virsh # domfsfreeze gentoo
Froze 1 filesystem(s)

virsh # domfsfreeze gentoo
error: Unable to freeze filesystems
error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance

Upgrading to a newer version fixes this.

Jérôme
  • 615
  • 2
  • 8
  • 19