8

I have two servers. The first (Main) is the server which has installed the script and database.
The second (Remote) is the server which has uploaded files only (just for storage).

Now I'm confused about how to upload the files, and I have 2 ideas to do that, and I don't know if them the best ways or not.

The ideas:

  1. Upload the file by ajax to the main server first and make all security operations like (check size, type, and so on), then upload again to the second server (remote server).

  2. Upload the file by ajax to the second (remote server) directly, and make all checks and security operations in there, then send the file information to the first server (main) to store the file information into database (Not recommended).

I want to know how behave the big upload sites in store the files that uploaded by users, the idea of how can they upload the files to remote servers ?

Lion King
  • 32,851
  • 25
  • 81
  • 143

1 Answers1

2

Both mechanisms are valid. In fact we tried both of them in two of our products. Each of them comes with their pros and cons.

Assumption: Main server serves the Web UI. Lets have the two servers named main.com and remote.com.

The main factors that you need to consider are:

  1. Cross-Origin Resource Sharing (CORS): The most straightforward way to check whether this is a big issue is whether you need to support IE <= 9. Legacy IE does not support XHR2 so AJAX file upload is not possible. There are popular fallbacks (e.g. iframe transport) but each comes with their own problems.

  2. Bandwidth: If file is uploaded to main.com first then to remote.com, then the required bandwidth is doubled. Depending whether your servers are behind load-balancers, the requirement will vary. Uploading directly to remote.com uses bandwidth efficiently.

  3. API response time: Unless your API is doing optimistic update, it needs to wait until the file is fully reuploaded to remote.com before it can respond reliably. Also, this depends on the RTT between main.com and remote.com.

  4. Number of API domain: This is a small issue. Web client has to specify which domain to use for which API. However it does relate to CORS issue above.

  5. Performance: If web server is handling file upload, it might have issue with performance when the server is processing large files (or large quantities of file). It might affect other users.

Mechanisms

1. Client uploads to main.com. main.com re-uploads to remote.com

  • CORS: No issue
  • Bandwidth: Double
  • Response time: Double (round trip); normal (optimistic)
  • API domain: One
  • Performance: Might affects web services

2. Client uploads to remote.com. remote.com sends info to main.com

  • CORS: Have issue
  • Bandwidth: Efficient
  • Response time: Efficient
  • API domain: Two
  • Performance: No issue

3. [EXTRA] Client uploads to main.com. main.com streams the file to remote.com. remote.com sends file information back.

  • CORS: No issue
  • Bandwidth: Double
  • Response time: Less than double (streamed)
  • API domain: One
  • Performance: Better than method 1 for its better memory footprint

Conclusion

Depending on your use case, you need to use different mechanism. For our case, we use method 2 (direct upload) for our legacy product because we need to support legacy browsers (IE 7, FF 3). Cross domain issues are stabbing us all the time for many different cases (e.g. when customers are behind proxies, etc.).

We use method 1 for our new product. Bandwidth and response time issues are still okay for normal cases, but when web server and remote server are deployed across continental, the performance is inferior. We have made many optimizations to make it acceptable but it is still worse than method 2.

Method 3 is used by myself in a side project. It is included here because I think it is a good candidate too.

Edit

The difference of streaming (method 3) and re-uploading (method 1) is mainly how the file is stored in main.com. This impacts resource allocation.

For re-uploading, an uploaded 2GB file is first stored in main.com, then re-uploaded to remote.com. main.com has to allocate resources to temporarily store the file (disk space, memory, CPU for IO). Also, being a serial process, the total time needed to complete the upload to remote.com is doubled (assuming time to upload to main.com equals to time to upload to remote.com).

For streaming, a file being uploaded to main.com is simultaneously uploaded to remote.com. Since main.com uploads a chunk of the file to remote.com as soon as it received the chunk, the upload processes are overlapped, resulting in shorter upload time (less than double). In another words, if there is no processing needed at main.com, main.com is effectively a proxy to remote.com. Also, since the file is not stored as a whole on main.com (chunks are normally stored in memory), it does not consume that much resources than re-uploading. However, if main.com needs to process the file as a whole, then streaming does not bring much benefits.

PSWai
  • 1,198
  • 13
  • 32
  • Thank you for your good details. but I'm still need more help for the best way whether these ideas or another. I'm still wonder (how behave the big upload sites in store the files to remote servers). – Lion King Dec 07 '15 at 03:27
  • @LionKing I am in the middle of editing the answer to provide a clearer picture. However the decision of which mechanism to use depends largely on your requirement and your environment. If you have load-balancers (the large sites have them), then the factor to consider will be different. By the way our products require handling of very large uploads (size and volumes) with many servers. So I believe the factors above are still valid :) – PSWai Dec 07 '15 at 03:42
  • Very good explanation, thank you. but unfortunately I don't understand the third case specifically `streams the file to`. what the meaning/difference between `streams the file` and `re-uploads the file`. – Lion King Dec 07 '15 at 16:38
  • @LionKing Thanks. I have edited the answer to include the difference between streaming and re-uploading. – PSWai Dec 08 '15 at 03:19