I am writing an abstraction layer that will abstract a back end implementation of (yet to be decided) distributed file system.
Possible choices for file systems to be used are HDFS, GlusterFS, CEPH ... .
Front end will be SOAP/ REST services.
The abstraction layer to be implemented will receive a stream of data from web-services and send it to back-end distribute file system.
The file sizes will be Multiple GBytes.
My question
What is the best approach to push data into distributed file system - if we need max through-put, no loss of data, and leveraging the distributed nature of back-end file system