Azure Data Lake Store concurrency

Question

I've been toying with Azure Data Lake Store and in the documentation Microsoft claims that the system is optimized for low-latency small writes to files. Testing it out I tried to perform a big amount of writes on parallel tasks to a single file, but this method fails in most cases returning a Bad Request. This link https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf shows that HDFS isn't made to handle concurrent appends on a single file, so I tried a second time using the ConcurrentAppendAsync method found in the API, but although the method doesn't crash, my file's never modified on the store.

score 3 · Accepted Answer · edited Dec 11 '17 at 07:41

3

What you have found out is correct about how parallel writes will work. I am assuming you have already read the documentation of ConcurrentAppendAsync.

So, in your case, did you use the same file for the Webhdfs write test and the ConcurrentAppendAsync? If that's the case, then ConcurrentAppendAsync will not work, as mentioned in the documentation. But you should have got an error in that case.

In any case, let us know what happened and we can investigate further.

Thanks,

Sachin Sheth

Program Manager - Azure Data Lake

edited Dec 11 '17 at 07:41

igorc

2,024
2
17
29

answered Mar 11 '16 at 23:06

Sachin Sheth

309
1
3

I don't know what I'm doing wrong, but the ConcurrentAppendAsync doesn't append anything to my file, but it doesn't fail either – evilpilaf Mar 15 '16 at 16:06
1

Can you please get in touch with me via email - sachinsatmicrosoftdotcom? Would like to see what is happening in details. Thanks. – Sachin Sheth Mar 21 '16 at 07:54

Azure Data Lake Store concurrency

1 Answers1