Both mechanisms are valid. In fact we tried both of them in two of our products. Each of them comes with their pros and cons.
Assumption: Main server serves the Web UI.
Lets have the two servers named main.com
and remote.com
.
The main factors that you need to consider are:
Cross-Origin Resource Sharing (CORS): The most straightforward way to check whether this is a big issue is whether you need to support IE <= 9. Legacy IE does not support XHR2 so AJAX file upload is not possible. There are popular fallbacks (e.g. iframe transport) but each comes with their own problems.
Bandwidth: If file is uploaded to main.com
first then to remote.com
, then the required bandwidth is doubled. Depending whether your servers are behind load-balancers, the requirement will vary. Uploading directly to remote.com
uses bandwidth efficiently.
API response time: Unless your API is doing optimistic update, it needs to wait until the file is fully reuploaded to remote.com
before it can respond reliably. Also, this depends on the RTT between main.com
and remote.com
.
Number of API domain: This is a small issue. Web client has to specify which domain to use for which API. However it does relate to CORS issue above.
Performance: If web server is handling file upload, it might have issue with performance when the server is processing large files (or large quantities of file). It might affect other users.
Mechanisms
1. Client uploads to main.com
. main.com
re-uploads to remote.com
- CORS: No issue
- Bandwidth: Double
- Response time: Double (round trip); normal (optimistic)
- API domain: One
- Performance: Might affects web services
2. Client uploads to remote.com
. remote.com
sends info to main.com
- CORS: Have issue
- Bandwidth: Efficient
- Response time: Efficient
- API domain: Two
- Performance: No issue
3. [EXTRA] Client uploads to main.com
. main.com
streams the file to remote.com
. remote.com
sends file information back.
- CORS: No issue
- Bandwidth: Double
- Response time: Less than double (streamed)
- API domain: One
- Performance: Better than method 1 for its better memory footprint
Conclusion
Depending on your use case, you need to use different mechanism. For our case, we use method 2 (direct upload) for our legacy product because we need to support legacy browsers (IE 7, FF 3). Cross domain issues are stabbing us all the time for many different cases (e.g. when customers are behind proxies, etc.).
We use method 1 for our new product. Bandwidth and response time issues are still okay for normal cases, but when web server and remote server are deployed across continental, the performance is inferior. We have made many optimizations to make it acceptable but it is still worse than method 2.
Method 3 is used by myself in a side project. It is included here because I think it is a good candidate too.
Edit
The difference of streaming (method 3) and re-uploading (method 1) is mainly how the file is stored in main.com
. This impacts resource allocation.
For re-uploading, an uploaded 2GB file is first stored in main.com
, then re-uploaded to remote.com
. main.com
has to allocate resources to temporarily store the file (disk space, memory, CPU for IO). Also, being a serial process, the total time needed to complete the upload to remote.com
is doubled (assuming time to upload to main.com
equals to time to upload to remote.com
).
For streaming, a file being uploaded to main.com
is simultaneously uploaded to remote.com
. Since main.com
uploads a chunk of the file to remote.com
as soon as it received the chunk, the upload processes are overlapped, resulting in shorter upload time (less than double). In another words, if there is no processing needed at main.com
, main.com
is effectively a proxy to remote.com
. Also, since the file is not stored as a whole on main.com
(chunks are normally stored in memory), it does not consume that much resources than re-uploading. However, if main.com
needs to process the file as a whole, then streaming does not bring much benefits.