0

I am downloading some JSON periodically, say every 10 seconds... When the data arrives, an event is fired. The event fired simply adds the JSON to a BlockingCollection<string>(to be processed).

I'm trying to process the JSON as fast as possible (as soon as it arrives...):

public class Engine
{
    private BlockingCollection<string> Queue = new BlockingCollection<string>();
    private DataDownloader DataDownloader;

    public void Init(string url, int interaval)
    {
        dataDownloader = new DataDownloader(url, interaval);
        dataDownloader .StartCollecting();
        dataDownloader .DataReceivedEvent += DataArrived;

        //Kick off a new task to process the incomming JSON
        Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning);
    }

    /// <summary>
    /// Processes the JSON in parallel
    /// </summary>
    private void Process()
    {
       Parallel.ForEach(Queue.GetConsumingEnumerable(), ProcessJson);
    }

    /// <summary>
    /// Deserializes JSON and adds result to database
    /// </summary>
    /// <param name="json"></param>
    private void ProcessJson(string json)
    {
        using (var db = new MyDataContext())
        {
            var items= Extensions.DeserializeData(json);
            foreach (var item in items)
            {
                db.Items.Add(item);
                db.SaveChanges();
            }
        }
    }

    private void DataArrived(object sender, string json)
    {
        Queue.Add(json);
        Console.WriteLine("Queue length: " + Queue.Count);
    }
}

When I run the program, it works and data gets added to the Database, but if I watch the console message from Console.WriteLine("Queue length: " + Queue.Count);, I get something like this:

1
1
1
1
1
1
1
1
2
3
4
5
6
7
...

I've tried modifying my Process to look like this:

/// <summary>
/// Processes the JSON in parallel
/// </summary>
private void Process()
{

    foreach (var json in Queue.GetConsumingEnumerable())
    {
        ProcessJson(json);
    }
 }

I then add multiple Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning); but I get the same problem...

Does anyone have any idea of what is going wrong here?

pookie
  • 3,796
  • 6
  • 49
  • 105
  • Doesn't the length of processing depends on your input json data? If the first few are simple, then they will get executed quickly, while if your 8th is huge it will stack up. Also do you have to save changes to db after every one item is added? (I have no experience with DataContext, but I guess that has to be expensive) – CrudaLilium Feb 05 '17 at 12:16
  • It is already as fast as possible, an operating system duty. If your code produces faster then it consumes, and that's quite likely because processing json is never cheap, then there is no other option but grow the collection. Which is why the "blocking" in BlockingCollection is important, it ensures that your program will not crash with OOM. – Hans Passant Feb 05 '17 at 14:10
  • @CrudaLilium Nah, the JSON is generally the same size, so that's not it... but good thinking. – pookie Feb 05 '17 at 14:13
  • @HansPassant The data is being downloaded every 10 seconds using WebClient. Surely adding more consumers would reduce the number of items in the queue? I would expect there to be 0 items in the queue most of the time (I've tried with upto 10 consumer tasks). – pookie Feb 05 '17 at 14:16
  • Google "amdahl's law", use a concurrency analyzer. Gives you facts instead of guesses. – Hans Passant Feb 05 '17 at 14:21

1 Answers1

0

The queue will initially be filled before processing starts. Probably because the Entity Framework stuff needs to get loaded and a database connection has to be established, this takes a while.

Then the GetConsumingEnumerable() starts to catch up with the downloading process, depleting the queue during downloading. The collection is empty, MoveNext() returns false, Parallel.ForEach() exits and Process() finishes.

Then you'll see the queue starting to fill up, because it's not consumed anymore.

You need to keep trying to read from the BlockingCollection until the download process finishes.

CodeCaster
  • 147,647
  • 23
  • 218
  • 272