2

we have a distributed application which uses large amounts of content (all kind of files). There are several servers that need to access the content. Right now the content is stored on each server redundantly. But this is getting ugly.

We want to store the content on a single storage instance with large hard discs. We then want to mount the file system of this storage instance from each of our servers.

I thought about using NFS, but the security scheme doesn't seem to fit. Right now I'm looking at Samba but I'm not sure if it is the right choice. All servers are Linux and Samba's main purpose is a Windows/Linux environment. What makes Samba interesting to me is the user level security.

Aside from security another major requirement is performance. Our servers need fast access to the content. That is as fast as possible over a LAN.

Is Samba a good choice? What other options are there? What about WebDAV?

EDIT: What I need to do: We have varying number of servers that need to access a growing number of files. We expect to become several TB. We call these files the 'content'. All servers have to use the same version of the content. The servers need concurrent read-only access to the content. The content is updated relatively seldom. Something between once a week and once a month but it likely become much more often. Right now it would be possible to sync the content on each server but that will become a pain in the near future. The update has to be quite snappy. We think it would be convenient to update/sync the content on only one server (storage server) and let all other servers mount the content as a remote filesystem.

All the best

Jan

Jan Deinhard
  • 2,383
  • 5
  • 26
  • 33
  • What security scheme make you confused about NFS? – quanta Jul 25 '11 at 08:05
  • As far as I understand NFS I need to configure the server with client addresses to grant them access to the filesystem. We don't know the addresses of the servers in advance so we can't use them to configure the NFS server. – Jan Deinhard Jul 25 '11 at 08:10

4 Answers4

3

Samba will almost certainly do what you want, and with fairly reasonable performance. It should have the necessary security controls to handle whatever use cases you've got in mind (your question is a bit short on details there).

It's hard to provide other recommendations, since you don't give a really good description of what you need to do, and what your constraints are. WebDAV probably isn't useful; it's not anything like a POSIX filesystem, and if you think you need blazingly fast performance, then you're probably wanting something that'll act like a full filesystem (arbitrary seeks, that sort of thing) which is going to be painful on WebDAV.

You also haven't talked about concurrent access to individual files, which has a strong bearing on your possible solution space. If only one client is accessing a given file at once, and especially if only one client will ever be updating a given file, then don't necessarily give up on periodic sync solutions -- they can do a good job, in the right conditions.

Finally, if it's mostly (or all) read-only, then consider making your data access higher-level. Rather than thinking that you have to have files, why not think in terms of useful application-specific abstractions? A common example of this is the humble SQL database -- rather than storing data in flat files and grovelling through it with custom code, some clever clods came up with the idea of a more specialised storage engine and the necessary verbiage to intelligently access it. It's not as flexible as a filesystem, but (in it's narrow niche) it's a damned sight quicker. Perhaps with a bit of imagination, you can come up with a similar abstraction for your data, which could save quite a lot of trouble?

womble
  • 96,255
  • 29
  • 175
  • 230
1

NFS - Your storage instance is the NFS server. What you call your servers that want to mount the NFS file systems off of the storage instance are your NFS clients. Your NFS storage instance does have to know the IP addresses of your NFS clients, ie, your servers, but you do know those addresses already. For these addresses you can allow a whole subnet at a time and you will at least know what subnet your servers live on. Note that companies such as NetApp sell exactly this sort of thing, and, they work pretty well.

Samba - My experience is slightly different then how you want to use it (ie, user supplies username/password combos when then mount the shares) so I can't comment on your proposed use.

Both should work fine, and, both should be able to saturate a 1Gb ethernet interface with no problem. I suspect that that will be your upper bound in terms of how much data you can get off of your storage instance. You can, of course, use multiple ethernet interfaces to work around that, and then you will probably be limited by how fast you can move data off of what ever you buy for disks.

I think one of the key numbers you need to know before you start this is, how much data do each of your varying number of servers need to read per second? Then you need to know what is the maximum number of servers you will have. Does your proposed centralized solution supply that much data per second? Right now you have solved this by having each server be independent and have a copy of the data.

Bruce ONeel
  • 401
  • 2
  • 1
  • We don't know the addresses of the servers. They are launched in a cloud and get arbitrary addresses. Also there is an arbitrary number of servers. We need to launch more when loads gets up and terminate them when the load decreases.The subnet is shared so we can't grant access to a specific subnet like 10.0.0.0/24 or something. Changing the cloud provider or using a private cloud are no options. – Jan Deinhard Jul 25 '11 at 14:32
  • Ok, you could work around that but it might be fragile. – Bruce ONeel Jul 26 '11 at 07:26
  • Regardless of which file sharing protocol you use, I now would be concerned about your file store being able to deal with the I/O from lots of servers. I think that you would be happier in the end to setup some sort of test environment, and, spin up the maximum servers you ever expect +10% and see if your file store can keep up. – Bruce ONeel Jul 26 '11 at 10:35
  • Thanks Bruce, we already have a testing environment next to our live deployment. – Jan Deinhard Jul 26 '11 at 11:36
0

Regarding your specific requirements of performance and speed, SSH would fit perfectly. SSH uses the sftp protocol to exchange files and uses the native Linux user permission control. You can use a computer system with large storage volumes(possibly with hardware-level or software-level RAID or Encryption, EXT4 file system is perfect as it works magically quick) as a Dedicated Attached Storage(DAS). Set up SSH server on it and define different user access levels on your data just the way you do it on each computer system you currently have. Then accessing the content on this server is as easy as accessing local data. Setting up keyrings on each computer is essential to make authentication on the server secure.

Ragowa
  • 19
  • 1
  • Thanks for the input Hessam. I already thought about SSH but I rejected the idea because I thought SSH would cost too much performance. The access has to be very quick. Am I wrong about SSH? – Jan Deinhard Jul 25 '11 at 08:15
  • I don't think there exists any quicker way than SSH. Only the authentication is a little slow on SSH and that you can surely handle using keyrings. – Ragowa Jul 25 '11 at 08:18
  • Secondly, Linux-based Networked Attached Storage(NAS) devices which are exactly designed to be accessed on the network on the fly are all using SSH. – Ragowa Jul 25 '11 at 08:19
  • 1
    And more importantly, Samba uses Linux built-in functions to handle permissions and scheduling read/write commands as its backend. So, it surely is slower than Linux itself. and accessing files over SSH only adds a network authentication over the local access. – Ragowa Jul 25 '11 at 08:21
  • Do mean adding the public keys to the authorized_keys by using 'keyrings'? Pardon me, I'm a quite new to system administration. – Jan Deinhard Jul 25 '11 at 08:24
  • So Samba would be slower than SSH? – Jan Deinhard Jul 25 '11 at 08:25
  • Adding public keys and encrypted passwords so the clients can authenticate quicker and passwordless yet securely on the server. the procedure is described well in these three websites: – Ragowa Jul 25 '11 at 08:37
  • http://linuxproblem.org/art_9.html – Ragowa Jul 25 '11 at 08:37
  • http://www.debian-administration.org/articles/152 – Ragowa Jul 25 '11 at 08:37
  • http://www.mtu.net/~engstrom/ssh-agent.php – Ragowa Jul 25 '11 at 08:37
  • Yes, my experience shows that Samba is slower. Moreover, the only benefit of a service like samba is granting windows clients access. How could it possibly add more security or speed to linux native filesystem and security? – Ragowa Jul 25 '11 at 08:41
  • 1
    How would the OP go about actually accessing the data over SSH? SSHfs is slow as buggery, and direct sftp isn't exactly equivalent to a filesystem. – womble Jul 25 '11 at 08:42
  • Once a tunnel is established the overhead for transfers via ssh is only around 15%, but the handshake intriduces massive amounts of latency - and there are no protcols I'm aware of which sit on top of a ssh tunnel, let alone implementing a full virtual filesystem – symcbean Jul 25 '11 at 14:17
  • @Hessame R: "Linux-based Networked Attached Storage(NAS) devices which are exactly designed to be accessed on the network on the fly are all using SSH" - references please? – symcbean Jul 25 '11 at 14:18
0

Consolidating on a server can make life easier - but how do you assure availability - duplicate network cards? Raid? Sometimes replication can be a good thing.

Since we're talking about server-server communications is user level security so much of a requirement? Certainly user authentication in NFS is weak - but what about using NFS + authentication at a lower level in the network such as IPSEC? Or a shared filesystem on top of iSCSI on top of a VPN?

Depending on the pattern of access, and if availability of local storage is not a problem, the fastest solution might be something like AFS - where you effectively get a very large local cache which has the added advantage of being usable when the server goes down.

symcbean
  • 21,009
  • 1
  • 31
  • 52