I have the following scenario.
I am doing splitting functionality by reading huge csv file line by line.
Each line have categoryId
.
Based on that Id I need to write this line into separate file.
To do this I am doing the following:
- Reading the huge file line by line.
- After reading each line I open a new stream based on the categoryId (only if the stream is not already opened). Write the line into the stream and then keep the stream open, because there might be more lines in in the huge file.
- At the end after all lines from the huge file are processed I am closing all open streams. This forces flush and closes the connections.
My question is. Do I need to manually invoke Flush() on lets say -> every 100 lines recorded or this is something handled by StreamWriter itself. I read on the web that there is a buffer that automatically flushes when it is full, but I am not sure if this is true. My concern is that if it doesn't flush and waits for the end of the big file, I might end up with the whole file loaded in memory.
Here is part of the code to see what am talking about:
try
{
while (!reader.EndOfStream)
{
var line = await reader.ReadLineAsync();
var locationId = line.Split(',')[0];
var gdProjectId = GetGDProjectId(locationId);
var blobName = $"{gdProjectId}/{DateTime.UtcNow.ToString("dd-MM-yyyy")}/{DateTime.UtcNow.ToString("HH-mm-ss")}-{Guid.NewGuid()}.csv";
if (!openWriters.ContainsKey(gdProjectId))
{
var blockBlobClient = containerClient.GetBlockBlobClient(blobName);
var newWriteStream = await blockBlobClient.OpenWriteAsync(true);
openWriters.Add(gdProjectId, new StreamWriter(newWriteStream, Encoding.UTF8));
}
var writer = openWriters[gdProjectId];
await writer.WriteLineAsync(line);
// SHOULD I MANUALLY INVOKE FLUSH ON EVERY {X} lines processed ?
// TODO: Check if we need to manually flush or the streamwriter does it for us when the buffer is full.
// await writer.FlushAsync();
}
}
catch (Exception ex)
{
throw;
}
finally
{
// we are always closing the writers no matter if the operation is successful or not.
foreach (var oStream in openWriters)
{
oStream.Value.Close();
}
}