Store and backup 200 million small files

Question

My disk are 10x1TB SAS 7200 RPM in a RAID 10 with a MegaRaid 9260 hardware controller with cache/BBU. This results in a 4.6TB RAID 10 volume. hdparm -t (when device is empty) results in 500MB/s.

RAID chunk size is 64KB, filesystem block size is 2KB (I'm going to change it to the minimum chunk size and 4KB block size).

The directory pattern is /data/x/yz/zyxabc.gz

I'm using EXT4 with plans to move to XFS. The OS is RHEL 6.

As of now, it works great. The workload is 99% reads and it can read up to 300 files/second under normal conditions. The problem is backups. It takes 6 days to backup with scp. rsync is even slower. DD goes at about 2MB/s. LVM snapshots could be an option if I take the snapshot, back it up, and then delete it. Data consistency is very important to me.

Files are about 0.5-4KB each. Would I see increased backup performance if I stored all of the files in a database instead? What other alternatives are there for me to tackle the problem of backing up this many small files in a reasonable window?

tar is also as slow as backing up data. Since that it reads the little files to then store it. Updating the tar archive every time a file is edited, is performance crazy. — cedivad, Nov 09 '11 at 12:30
Ah, you'd not mentioned incremental backups. You should clarify. — Iterator, Nov 09 '11 at 14:28

score 3 · Answer 1 · answered Nov 09 '11 at 12:41

3

Have you considered solutions like AMANDA or Bacula?

answered Nov 09 '11 at 12:41

Some programmer dude

141
4

1

I love that name Bacula. – Only Bolivian Here Nov 09 '11 at 12:49
i don't see any speed advantage over rsync and dd, i'm i blind? =) – cedivad Nov 09 '11 at 12:49
1

@cedivad These systems are enterprise-class backup solutions, and should be able to provide both incremental and differential backups, which should be faster than using `dd` on the whole array I guess. – Some programmer dude Nov 09 '11 at 12:52
uhm, yes, you are correct, sorry =) this works for incremental backups. – cedivad Nov 09 '11 at 12:57

score 2 · Accepted Answer · edited Nov 09 '11 at 14:53

2

i plan to move to XFS

You'd better pre-order tons of Prozac in that case. :-) XFS sucks a lot on that pattern (lots of tiny files), alas.

If you're considering FS change Reiser3 is the only option worth of trying in that case, IMO. With notail you get less CPU overhead, w/o notail — less disk space overhead.

RAID chunk of 64 K is also beyond of sanity — why overflow disk I/O queues with such tiny patterns? Increase it instead of decreasing! With lots of simultaneous I/O it won't hurt.

Now when it comes to backing up, it's possible to mention COW FSes. Such as Btrfs, or Nilfs. LVM-2 snapshots possibly are ok as well, so you can try combine it with migration to Reiser3. But I guess COW FSes have more chances to give you what's needed.

edited Nov 09 '11 at 14:53

Iterator

135
6

answered Nov 09 '11 at 12:58

poige

9,448
2
25
52

I see 2 ways to fix my problem: (1) - disk remotely synch with one on a remote backup server, on that server i take backups using snapshots. So the overhead on the root server is minimal. (2), using XFS under OpenSolaris. I know NOTHING about solaris, but it seems to be stable as a rock. I have working snapshot to take backup and another important feature, i could use a 500GB SSD drive i have as pool cache. So half of the files would load in no time (200 million files = 1TB). Or maybe, a mix of the 2 options: XFS on Solaris used for caching, with the data live on the remove server to backup. – cedivad Nov 09 '11 at 13:13
2

@cedivad, I think you're messing up XFS and ZFS. Don't. :) – poige Nov 09 '11 at 15:42
Yes, sorry =) I will use zfs under solaris ;) – cedivad Nov 09 '11 at 16:02
With lots of small files, wouldn't he want to DECREASE srtipe size? – Bigbio2002 Nov 10 '11 at 18:32
@Bigbio2002, you might be, but not me — small stripe size means always involving several disks even for relatively small size I/O. You can get increased bandwidth for single operation, but more busy disks for simultaneous I/O threads. That's more crappy, specially nowadays when throughput of typical SATA disk is quite satisfactory for serving single file request by itself. – poige Nov 11 '11 at 09:32
Interesting, didn't know that. RAID optimization is a complex topic. – Bigbio2002 Nov 11 '11 at 16:00
@Bigbio2002, it's just a matter of logical thinking mostly. People often tend to "cache" knowledge/logical reasoning and don't relize that cache data became obsoleted. – poige Nov 11 '11 at 19:02
XFS has made huge progress in the past years. I successfully tested a filesystem with 1 billion files, without any problem. Recent options like "lazy-count" and "inode64" allow much better journal management. – wazoox Nov 17 '11 at 17:51
1

@wazoox, those options are nothing comparing to long-awaited `delaylog` but its stability is still be under question: http://comments.gmane.org/gmane.comp.file-systems.xfs.general/35886 Also, it's unclear what you tag as problem since waiting 4 hours instead of 1 could be no problem for some people as well. – poige Nov 18 '11 at 01:54
@poige, as you can see I've posted in the thread you pointed to, but this was a year ago and most probably a 3Ware related problem more than an XFS bug, AFAIK. – wazoox Nov 18 '11 at 11:53
@wazoox, nope I haven't noticed your posting there. – poige Nov 18 '11 at 12:07

score 1 · Answer 3 · answered Nov 09 '11 at 12:58

Either use a backup solution that supports incremental backups, such as those already mentioned, or perhaps can you use a script that traverses the tree and only copies files with a certain modify time?

I'm not sure what you mean by "I need consistency" though. Do you mean all files need to be backed up at the same point in time (i.e. snapshot)? In that case I'm not sure any sort of tar, copy, rsync or similar will work - you'll HAVE to use something that can create file-system snapshots, or pause whatever process is creating these files in the first place.

Yes, that's the kind of consistency i need - thanks for your reply! — cedivad, Nov 09 '11 at 13:09

StrangeWill · Answer 4 · 2011-11-09T15:05:24.893

"DD goes at about 2MB/s"

I'm confused, doesn't dd do a sequential (or attempt to) read of the device? Is it competing with the online use of these files? If that's the case I think more disks/faster disks are in order. 1TB SAS is still 7,200 RPM if I'm not mistaken, you can pick up 600GB 15K SAS which will cut your seeks drastically.

Are you dumping it to a RAMDisk? So that your destination location can't be the bottleneck of the DD test (and you're not dumping it right back to the local disk, again causing high seeks).

If 2MB/s the best you're going to get out of the fastest possible read pattern, you need faster disks.

However, dd wont get you a consistent snapshot without combining it with something else.

Store and backup 200 million small files

4 Answers4