1

Using FreeBSD and a mounted NetApp NFS Share.

I'm trying to copy a file

FROM: localdisk/something.vmdk (size 527776 kilobytes)
TO: nfsmount/copy-something.vmdk (size 533168 kilobytes)

But it seems like from the above size I've demonstrated that the file somehow grew after the action of copying. Even though I'm just trying to duplicate the file.

Any idea's how this could happen? I'm simply running

cp localdisk/something.vmdk nfsmount/copy-something.vmdk

Then running a du on each to check file size and they're mysteriously growing in size.

uname -rs
FreeBSD 8.1-RELEASE-p2
Basil
  • 8,851
  • 3
  • 38
  • 73
  • 2
    This can happen if the block sizes of the filesystems are different. Are the source and destinationss formatted differently? – Nathan C May 09 '14 at 14:30
  • If you want to confirm that you have an exact copy of the file, it's usually better to run `md5sum` on both and compare the hash. If it's the same you can be sure the file is exactly the same. – faker May 09 '14 at 14:34
  • `du` reports disk space used not file size. For sparse files these are not the same. Does `ls -ls` show them the same size? (first column will be blocks used - may differ, latter size should be the same). – Brian May 09 '14 at 14:46
  • @Brian ls -ls still shows a difference. Source: 527776 Dst: 533168 – cheesesticksricepuck May 09 '14 at 14:58
  • @faker md5 reports a match: src: 224067840895a0499f2a0d8d33ecd185 dst: 224067840895a0499f2a0d8d33ecd185 – cheesesticksricepuck May 09 '14 at 14:59
  • @NathanC Do you know how I would check block sizes on NetApp? Can I do that over NFS? I dont have access to the Netapp admin. Here's what i got for bsd block dumpfs /d2/ | grep bsize bsize 16384 shift 14 mask 0xffffc000 maxbsize 16384 maxbpg 2048 maxcontig 8 contigsumsize 8 sbsize 2048 cgsize 16384 csaddr 1680 cssize 497664 – cheesesticksricepuck May 09 '14 at 15:01
  • @NathanC Looks like netapp default block size is 4K on WAFL filesystem. So is there any way to make this play nicely? I'm trying to test dedupe but these differences are preventing it from happening. – cheesesticksricepuck May 09 '14 at 15:11
  • Sure looks like a sparse file getting filled in with blocks or partial blocks filled with zeros on getting copied to another file system. That would be why the md5 for each is the same. Can you post the actual output from `ls -ls filename` for each? – Brian May 09 '14 at 17:31
  • @Brian i see now that one of the columns gives the same size but the initial size on the far left is different. What does each mean? ls -ls /images/something.vmdk 527776 -rw-r--r-- 1 user domusers 543817728 Mar 24 16:20 /images/something.vmdk ls -ls /netapp/images/copy-something.vmdk 533168 -rw-r--r-- 1 root wheel 543817728 May 9 09:32 /netapp/images/copy-something.vmdk – cheesesticksricepuck May 09 '14 at 17:36
  • The number on the far left is the number of blocks used on disk, same as what comes from `du`. The other number is the file size. Sparse files have holes in them where blocks that are all zeros aren't actually stored and don't take up any disk space. Google `sparse files` for why they exist and some of the drawbacks. – Brian May 09 '14 at 18:04
  • @Brian okay I understand now why one file may be larger than the other. What I'm still unsure about is why deduplication isn't working. It should still be able to dedupe the majority of the blocks of this file. – cheesesticksricepuck May 09 '14 at 21:04
  • Dedupe is invisible to the client. On the filer - run: `sis start /vol/volname`. Monitor it with `sis status`. Then run `df -gs /vol/volname` which will show you used vs. saved on the filesystem. You cannot do this client side - the filer hides this information from you. – Sobrique May 19 '14 at 09:16

1 Answers1

1

Based on your comments, here's what I understand:

  • You have a VM saved on a local disk
  • You are trying to copy it to a Netapp NFS share with dedupe enabled to test dedupe

If this is true, the reason you're not seeing an immediate gain is probably that the deduplication on Netapp is post process. The Netapp does a bit-to-bit comparison of any candidate blocks as a background task before deduplicating (which replaces a duplicate block with a pointer to an original block). This process is managed centrally, so only your storage admin can tell you what the schedule is. It's a lot of reads, so people tend to not schedule it during, for example, backups.

Basil
  • 8,851
  • 3
  • 38
  • 73
  • You also won't see the dedupe saving on `ls` - you'll see the original file size. You may not even see it on `df`as that depends rather if the exported filesystem is a volume or (quotaed) qtree. And whether snapshots are enabled. (First pass dedupe saving probably moves into snapshots, and only actually saves space once snaps expire) – Sobrique May 15 '14 at 14:27
  • The asker specified that they're using du :) – Basil May 15 '14 at 18:07