For our project we are using Azure File Storage, in which large files (at most 500 MB) can be uploaded and must be processed by Java microservices (based on Spring Boot), by using Azure SDK for Java, that periodically polls the directory to see if new files have been uploaded. Is it possible, in some ways, to determine when the uploaded file is completely uploaded, without the obvious solutions like monitoring the size?
Asked
Active
Viewed 779 times
3
-
1Is there a reason you’re using File Storage and not Blob Storage? – Gaurav Mantri May 04 '20 at 13:44
-
1You could use a hashing algorithm and hashes (sha1, md5,etc) to determine file completeness. – ControlAltDel May 04 '20 at 14:48
-
@GauravMantri-AIS legacy access, I am obliged to use it because another system drops files there. – apetrelli May 05 '20 at 09:42
-
@ControlAltDel unluckily this is not a thing that I can control, the file is put by another system. – apetrelli May 05 '20 at 09:43
1 Answers
3
Unfortunately it is not directly possible to monitor when a file upload has been completed (including monitoring the size). This is because the file upload happens in two stages:
- First, an empty file of certain size is created. This maps to
Create File
REST API operation. - Next, content is written to that file. This maps to
Put Range
REST API operation. This is where the actual data is written to the file.
Assuming data is written to the file in sequential order (i.e. from byte 0 to file size), one possibility would be to keep on checking last "n" number of bytes of the file and see if all of them are non-zero bytes. That would indicate some data has been written at the end of the file. Again, this is not a fool-proof solution as there may be a case where last "n" bytes are genuinely zero.

Gaurav Mantri
- 128,066
- 12
- 206
- 241
-
Thanks, this was what I supposed initially. For the moment I upvote your solution, I will accept it later this week. – apetrelli May 05 '20 at 10:15
-
You're welcome. No rush in accepting the answer :). Someone might come up with a better solution. – Gaurav Mantri May 05 '20 at 10:18
-
@apetrelli How to know if the file is not being written by an REST API at that point? Could we check any sort of lock status? One way would be to check if size and lastModifiedAt are not chnaged since 3-5 secs, not sure if it's correct all the time. – Gautam Kumar Samal Jun 10 '21 at 17:40
-
I see what you meant by checking the last "n" bytes for zero. Is there any way to do that without streaming the whole content, something like ReadRange(start, end)? – Gautam Kumar Samal Jun 10 '21 at 19:12
-
@GautamKumarSamal the system is transparent to us. In the end, however, we noticed that the system transfers the file using a random name and renamed to its final form only after all the file has been transferred, so we used this mechanism to understand when the file is complete. However this behaviour is bound to the specific uploader and it is not generic. – apetrelli Jun 15 '21 at 15:25