clonehd takes 2 hours for a 100GB disk. Expected?

Question

I have a VirtualBox image using VDI and flexible size. Right now the physical DVI file 94GB. The host is an Ubuntu server and the guest is a CentOS 6.4 with ext4 partition. The host is using a sata normal disk of 1TB.

Disk read speed is:

 sudo hdparm -tT /dev/sda

 /dev/sda:
 Timing cached reads:   23330 MB in  2.00 seconds = 11679.09 MB/sec
 Timing buffered disk reads: 420 MB in  3.01 seconds = 139.49 MB/sec

Disk write speed:

 sudo dd if=/dev/zero of=output bs=8k count=128k; sudo rm -f output
 131072+0 records in
 131072+0 records out
 1073741824 bytes (1.1 GB) copied, 4.91353 s, 219 MB/s

So... I guess copying a 100GB should take much less. In fact, a simple cp takes a lot less. In my example, it takes 30 minutes, compared to the 2 hours of clonehd.

I know that clonehd also does a compact so that might be a big difference. Now..say that I want to compact only once explicitly and then just clone. Is there another faster alternative? I read somewhere that what I can do is a simple cp and then change the UUID of the created cloned image. Has anyone did this? Is it safe? Since I am doing this for backups...I need the process to be safe.

Note that I already cleaned free space in guest Cent OS using dd or zerofree. But this is not the topic of this thread.

Actually, I would not have that much problem if I could run the clonehd while still running the VM, but I read this is not possible/recommended since the cloned/copied vdi files could be corrupted if it was concurrently modified.

Thanks in advance,

I don't know your clonehd command, but I once did a test between CloneZilla and dd...CZ was WAY faste..like 3-4 minutes vs 5 hours. CloneZilla being free, maybe you want to try. — vn., Aug 22 '14 at 19:56
Clonezilla being surprising is not a surprise. Clonezilla only copies the in-use blocks, while dd copies everything. — devicenull, Aug 23 '14 at 00:54
Remember that dd is a SYNCHRONOUS process, a READ must finish before a WRITE is allowed to happen and visa versa. In the test dd using /dev/zero, you are not hitting any disk for input and thus your results will be misleading. Furthermore, you are probably getting the buffer directly from memory for the input. — mdpc, Aug 23 '14 at 01:14
Remember that sparce file copies MAY expand and take far longer to copy and thus again your final experience may be way lower than your testing. — mdpc, Aug 23 '14 at 01:15

score 1 · Answer 1 · answered Oct 26 '15 at 09:07

You wrote that "the host is using a sata normal disk of 1TB". This leads me to believe that you are talking about a rotational HDD, possibly even a desktop-grade drive (like a 7200 rpm SATA instead of a 10k or 15k rpm SAS drive, let alone SSDs).

Keep in mind that 7200 rpm drives commonly top out at about 100-120 MB/s. This puts an upper bound on what you should expect to get, when not relying on caching. (Note that the dd command in the question most likely relies heavily on caching, and thus gives an inaccurate picture of the I/O performance. You can add conv=sync to force using synchronous I/O. Also as pointed out in the comments, by reading from /dev/zero, you are eliminating one side of the equation.)

It's quite possible that, since your VM's disk image file is thinly provisioned, that it's fragmented on disk. Depending on how heavily fragmented it is, you may even be IOPS-bound. 7200 rpm drives have a theoretical maximum capability of 120 IOPS.

A single way (read or write) 100 GB at 110 MB/s will take about 900 seconds. Double that because you are both reading and writing, and you are looking at 1800-2000 seconds, or half an hour.

When adding fragmentation to this, two hours certainly sounds like it would be in the ballpark.

clonehd takes 2 hours for a 100GB disk. Expected?

1 Answers1