3

I need to create a redunant 2-linux machines configuration so that the files on server1:/dir are in sync with the files on server2:/dir
I managed to configure GlusterFs to do this but while it works pretty fine with large files it works awfully slow when many small files are involved.
To understand better: a 150MB archive with 50K files is normally unpacked in 3-4 seconds on the regular file system but in more than 15 minutes on the GlusterFS parition!
After much reading and testing I couldn't significantly improve this.

I was wondering anyone has experience with another clustered file system that is capable of handling many small files better than glusterFS. Or if you have another suggestion on what should I try for keeping a dir on 2 servers in sync.

Alex Flo
  • 1,761
  • 3
  • 18
  • 23
  • How far apart are these machines, and what does the network look like? You have to take latency into account, every write or meta-data operation has to wait for both the local disk and the remote network acknowledgement to complete. Any "synchronous" file-system replication technology (GlusterFS, DRBD, or whatever) will have this same requirement. – rmalayter Sep 25 '12 at 13:05
  • They are in the same datacenter, on a 100MB connection with 0.1ms latency. – Alex Flo Sep 25 '12 at 13:17
  • What are you using as storage backend (SAN, iSCSI, DRBD, ...)? – Nils Oct 06 '12 at 21:08
  • @Nils - its gigabit connection, I was wrong. Still slow nevertheless. Storage: commodity SATA. – Alex Flo Oct 08 '12 at 08:25
  • Did you try using glusterfs via nfs instead of the fuse client? It usually works a lot faster for smaller files. Sill if you need something faster maybe drbd would be better. – Jure1873 Dec 11 '12 at 16:51
  • @Jure1873 thanks but I eventually chose to use sshfs mounted file system and works very well this way. I'll keep in mind your suggestions should I need in the future again. – Alex Flo Dec 14 '12 at 12:39

2 Answers2

3

DRBD is sort of an alternative to glusterFS in that my 50K files test was completed in 40 seconds instead of 15+minutes.
My conclusion to this is that glusterFS seems fit for not-so-many large files and that DRBD works better when many small files are involved.
I know its an "apple vs pears" comparison but it it may save someone some hours of work.

Alex Flo
  • 1,761
  • 3
  • 18
  • 23
2

DRBD does a RAID-1 over the network which might be closer to what you want. I still have not found it to be terribly fast though.

David Mackintosh
  • 14,293
  • 7
  • 49
  • 78
  • Do you have it installed on some servers? I was curious if you could tell me how much it would take for example to unpack a kernel archive. – Alex Flo Sep 25 '12 at 12:48
  • But if you run it in dual-primary configuration then you still need a distributed lock manager to avoid write conflicts. AFS might be a better solution to support writes at both ends. – symcbean Sep 25 '12 at 14:27