1

I wanted to read lines from the CSV File and use RX.Net to do some transformation and I wanted to do batch update and send the update every 250 milliseconds

public static IEnumerable<string> ReadCSV(string filePath)
{
    var reader = new StreamReader(File.OpenRead(filePath));
    while (!reader.EndOfStream)
    {
        var line = reader.ReadLine();
        yield return line;
    }
}

var rows = ReadCSV("filePath").ToObservable();

rows
    .Buffer(50)
    .Zip(Observable.Interval(
        TimeSpan.FromMilliseconds(250)), (res, _) => res)
    .Subscribe(lines =>
        {
            //do something
        });

I use csv file with around 80mb, but the console project goes up to 1gb.

What happening here is the Zip is waiting for both sequence to give it signal. Csv sequence is giving the data very fast, so it is storing the batch updates in memory and waiting for the other sequence.

What makes it even worse is that, the memory is not released even all the updates are being processed. If I remove the Zip, memory looks very good, it looks like it's releasing the memory when the batch is being processed (the whole app just take around 20mb entire time).

Two questions

  1. Is there a way to tell the observable I want to pause the read until the previous one is processed(in my case is the buffered lines).

  2. Why the memory is not released after all the updates are being processed, is there a way to avoid this?

Will
  • 155
  • 7

2 Answers2

0

I manage to find a solution for question 1.

rows
    .Buffer(50)
    .Select(lines =>
    {
        Thread.Sleep(250);
        return lines;
    }
    .Subscribe(lines =>
        {
            //do something
        });

The whole process is synchronize so when I do Thread.Sleep the observable also stop reading the data.

It may not be a good answer though.

Will
  • 155
  • 7
0

I was not able to recreate your issue with memory usage. I used a 50mb file. However I guess part of your trouble is, that the .ToObservable() pulls data from the IEnumerable, as fast as possible.

So why not just slow the IEnumerable down, the speed of which you pull data from the disk, by an extension method?

(The .Buffer() operator for IEnumerable used in the example is available in Ix.Net).

Like so:

ReadCSC()
.Buffer(50)
.SlowDown(250)
.ToObservable() etc.
...

public static IEnumerable<IList<string>> SlowDown(this IEnumerable<IList<string>> source, int milliSeconds)
{
    foreach(var item in source)
    {
        yield return item;
        Thread.Sleep(milliSeconds);
    }
}

(In C# 8, it will be possible to make this method async and use Task.Delay instead of Thread.Sleep so you don't block the thread).

This way your data is read from the disk at a slower rate. If it will fix your memory issue, I don't know.

Magnus
  • 353
  • 3
  • 8