ArrayPool create method giving error in C#

Question

Basically, i want to read data from source file to target file in azure data lake parallelly using ConcurrentAppend API.

Also, i dont want to read the data from files all at once but in chunks , i am using buffers for that. i want to create 5 buffers of 1 MB , 5 buffers of 2 MB, and 5 buffers of 4 Mb. whenever a source file arrives , it will use the appropriate buffer according to its size and i will append to target using that buffer. I dont want buffers to exceed 5 in each case/configuration.

I was using a shared ArrayPool for renting buffers. But since i have this condition that allocation should not exceed beyond 5 arrays in each case ( 1, 2 and 4 MB) -> i had to use some conditions to limit that.

I would rather like to use a custom pool which i can create like :

ArrayPool<byte> pool = ArrayPool<byte>.Create( One_mb , 5)

this will take care that my allocations dont go beyond 5 arrays and max size of array will be 1 MB. Similarly i can create two more buffer pool for 2 and 4 mb case. This way i wont need to put those conditions to limit it to 5 .

Problem :

when i use this custom pool , i get corrupted data in my target file. Moreover , target file size gets doubled, like if sum of input is 10 mb -> target file shows 20 mb .

If i use the same code and rent from single shared ArrayPool rather than these custom pools, i get correct result.

What am i doing wrong ?

My code : https://github.com/ChahatKumar/ADLS/blob/master/CreatePool/Program.cs

Not sure why this was down voted, the question is seemingly ok — TheGeneral, Jun 26 '20 at 22:12
I assume because the code is in github and not in the question. It likely has nothing to do with the custom pool but rather the way they read into a byte array and assume it to be full, which happens to be in code hosted off-site — pinkfloydx33, Jun 26 '20 at 22:15
yep , i thought putting the code will make the questions very large.. — kchahat20, Jun 27 '20 at 11:33

pinkfloydx33 · Accepted Answer · 2020-06-26T22:10:24.883

FileStream.Read returns the number of bytes read. This will not necessarily be the size of your array and could very well be smaller (or zero if no byes were read). The code in your github example is ignoring the value of Read and making the incorrect assumption that the buffer was filled by telling the next method to use the entire buffer. Because your arrays are so large, it is possible (and perhaps likely) that you will not read them entirely with a single call to Read (even if the files are actually that large, FileStream has its own internal buffer and buffer size).

Your method should likely look like the following. Note I pass the actual number of bytes read to ConcurrentAppend (which I assume to be well conforming in that it respects the length argument):

int read;
while ((read = file.Read(buffer1, 0, buffer1.Length) > 0)
{
   c.ConcurrentAppend(filename, true, buffer1, 0, read);
}

ArrayPool create method giving error in C#

1 Answers1