2

I have a virtual windows server in VMWare Server 2.0 environment. When I create or remove snapshot, it takes ~30 minutes and Windows server goes completely unresponsive until operation completes. Anyone experiences same issue and/or knows how to fix it? I'm trying to ensure that the server doesn't go unresponsive while snapshot is being removed.

galets
  • 806
  • 3
  • 7
  • 18

3 Answers3

2

What type of disk(s) do you have backing this server? Removing snapshots is a very IO-intensive operation (especially if you're do it while the VM is running).

Do you have the option of switching to an ESXi host instead of VMware Server? You'll see much better performance with VMware running on bare metal.

EEAA
  • 109,363
  • 18
  • 175
  • 245
  • disk subsystem is poor... are you saying ESXi is much better? I've been reluctant to use ESXi, since there's no good option for backup, not a free one at least – galets Oct 11 '09 at 23:55
  • Performance *is* going to be better on ESXi due to the fact that, by removing the extraneous host operating system, you're getting rid of one layer of abstraction between the guest VM and bare metal. Regarding backups, unless I'm mistaken, the situation isn't any better using VMware Server, correct? To get crash-consistent backups on either platform, you need to suspend the VMs, then copy the files off the vmfs. – EEAA Oct 12 '09 at 00:39
  • I guess what I don't understand is what exactly is vmware server doing while removing snapshot, why it must take so much I/O and time. If ESXi is different, it cannot be just operating system, my server runs under Linux with very few additional subsystems enabled. There must be some principal differences. And backup-wise: it is easier on standard Linux, there is rsync, rdiff-backup and much more. I have no idea how to put those on ESXi – galets Oct 12 '09 at 04:32
  • 2
    I believe all the changes that have been logged to the snapshot file must be applied to the actual disk image. ESXi uses a more sophisticated snapshot technology and does not suffer in this way. – Roy Oct 13 '09 at 13:27
1

Regarding VMWare taking time to revert a snapshot:

As I've understood it, removing a snapshot (i.e. keeping the changes and removing the ability to revert) is simply a matter of removing the snapshot files.

This indicates to me that VMWare uses the original disk image as it does regardless of the existence of a snapshot or not. However, when a snapshot has been taken it also writes "delta"-data to the snapshot file.

This means the latest version of the files as they are seen in the VM resides in the original disk image, and the difference between these files and the snapshot resides in the snapshot files.

When reverting a snapshot, VMWare would then "undo" all changes as they are registered in the snapshot files, reaching the state of when the snapshot was taken when all changes have been undone.

I'm guessing this way the "real time" I/O can be prioritized over the "snapshotting" I/O.

I'm sure digging around in VMWare archives would give an even better answer but I belive this is how it works.

/H

1

Old but interesting stuff.

When you create a snapshot, what happens is that the original disk is left unchanged and operations will be registered or logged on a file.

When you discard that snapshot, an operation that should have been called "discard the ability to came back to a earlier version of this disk", all the contents of that log or journal are applied (committed) to the actual disk (which was being held unchanged up to now) and that can take very long, specially if the snapshot is old (ie. many operations to commit). Even longer if other virtual machines are doing disk operations, which will have precedence.

motobói
  • 1,741
  • 1
  • 12
  • 17