I've been toying with Azure Data Lake Store and in the documentation Microsoft claims that the system is optimized for low-latency small writes to files. Testing it out I tried to perform a big amount of writes on parallel tasks to a single file, but this method fails in most cases returning a Bad Request. This link https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf shows that HDFS isn't made to handle concurrent appends on a single file, so I tried a second time using the ConcurrentAppendAsync method found in the API, but although the method doesn't crash, my file's never modified on the store.
Asked
Active
Viewed 1,608 times
1 Answers
3
What you have found out is correct about how parallel writes will work. I am assuming you have already read the documentation of ConcurrentAppendAsync.
So, in your case, did you use the same file for the Webhdfs write test and the ConcurrentAppendAsync? If that's the case, then ConcurrentAppendAsync will not work, as mentioned in the documentation. But you should have got an error in that case.
In any case, let us know what happened and we can investigate further.
Thanks,
Sachin Sheth
Program Manager - Azure Data Lake

igorc
- 2,024
- 2
- 17
- 29

Sachin Sheth
- 309
- 1
- 3
-
I don't know what I'm doing wrong, but the ConcurrentAppendAsync doesn't append anything to my file, but it doesn't fail either – evilpilaf Mar 15 '16 at 16:06
-
1Can you please get in touch with me via email - sachinsatmicrosoftdotcom? Would like to see what is happening in details. Thanks. – Sachin Sheth Mar 21 '16 at 07:54