Simply put our system consists of a Server and an Agent. The Agent generates a huge binary file, which may be required to be transfered to the Server.
Given:
- The system must cope with files up to 1G now, which is likely to grow to 10G in 2 years
- The transfer must be over HTTP, because other ports may be closed.
- This is not a file sharing system - the Agent just need to push the file to the Server.
- Both the Agent and the Server are written in Java.
- The binary file may contain sensitive information, so the transfer must be secure.
I am looking for techniques and libraries to help me with transfering huge files. Some of the topics, which I am aware of are:
- Compression Which one to choose? We do not limit ourselves to gzip or deflate, just because they are the most popular for HTTP traffic. If there is some unusual compression scheme, which yields better results for our task - so be it.
- Splitting Obviously, the file needs to be split and transfered in several parallel sessions.
- Background Transfering a huge file takes a long time. Does it affect the solution, if at all?
- Security Is HTTPS the way to go? Or should we take another approach, given the volume of data?
- off-the-shelf I am fully prepared to code it myself (should be fun), but I cannot avoid the question whether there are any off-the-shelf solutions satisfying my demands.
Has anyone encountered this problem in their products and how was it dealt with?
Edit 1
Some may question the choice of HTTP as the transfer protocol. The thing is that the Server and the Agent may be quite remoted from each other, even if located in the same corporate network. We have already faced numerous issues related to the fact that customers keep only HTTP ports open on the nodes in their corporate networks. It does not leave us much choice, but use HTTP. Using FTP is fine, but it will have to be tunneled through HTTP - does it mean we still have all the benefits of FTP or will it cripple it to the point where other alternatives are more viable? I do not know.
Edit 2
Correction - HTTPS is always open and sometimes (but not always) HTTP is open as well. But that is it.