0

I have have a high speed stream of stock prices coming from a vendor... maybe 5000 per second. (about 8000 different symbols)

I have a table (SymbolPrice) in my database that needs to be updated with the most recent last price.

I don't seem to be able to keep the database updates fast enough to process the queue of last prices.

I am on an Azure Sql Server database, so I was able to upgrade the database to a premium version that supports In-Memory tables and made my SymbolPrice table an In-Memory table... but still not good enough.

If it ends up skipping a price, this is not a problem, as long as the most recent price gets in there as quick as possible... so if I get blasted with 10 in a row... only the last needs to be written... this sounds easy, except the 10 in a row might intermixed with other symbols.

So, my current solution is to use a ConcurrentDictionary to hold only the most recent price. And use a queue of Symbols to push updates to the database (see code below)... but this still isn't fast enough.

One way to solve this would be to simply repeatedly do a pass through the whole dictionary... and update the database with the most recent price... but this is a little bit of a waste as I would also be updating values that might only be updating every few minutes at the same rate as values that update many times a second.

Any thoughts on how this can be done better?

Thanks!

  • Brian

      public ConcurrentDictionary<string, QuoddLastPriceCache.PriceData> _lastPrices = new ConcurrentDictionary<string, QuoddLastPriceCache.PriceData>();
      public ConcurrentQueue<string> _lastPriceSymbolsToUpdate = new ConcurrentQueue<string>();
    
    
      public void Start()
      {
          Task.Run(() => { UpdateLastPricesTask(services); });
    
          lastPriceCache.PriceReceived += (symbol, priceData) =>
          {
              _lastPrices.AddOrUpdate(symbol, priceData, (key, value) => { return priceData; });
              _lastPriceSymbolsToUpdate.Enqueue(symbol);
          };
      }
    
      private void UpdateLastPricesTask(IServiceProvider services)
      {
          _lastPriceUpdatesRunning = true;
    
          while (_lastPriceUpdatesRunning)
          {
              if (_lastPriceSymbolsToUpdate.TryDequeue(out string symbol))
              {
                  if (_lastPrices.TryRemove(symbol, out QuoddLastPriceCache.PriceData priceData))
                  {
                      // write to database
                      if (_lastPriceScope == null || _lastScopeCreate + TimeSpan.FromSeconds(60 * 5) < DateTime.UtcNow)
                      {
                          if (_lastPriceScope != null)
                              _lastPriceScope.Dispose();
                          _lastPriceScope = services.CreateScope();
                      }
    
                      var unitOfWork = _lastPriceScope.ServiceProvider.GetRequiredService<IUnitOfWork>();
                      unitOfWork.SymbolPrice.UpdateLastPrice(symbol, priceData.Price, priceData.Timestamp);
                  }
              }
              else
                  Thread.Sleep(1);
          }
      }
    
Brian Rice
  • 3,107
  • 1
  • 35
  • 53
  • 1
    How fast is it, and how fast does it need to be? Exactly how you interact with the database is likely to be critical here, and you don't show any of that code. For instance you could batch up changes and send one batch of changes per second or so using a Table-Valued Parameter or JSON. – David Browne - Microsoft Apr 23 '21 at 18:45
  • Is it a continuous stream 5000 items per second all day? How is the data arriving, a set of HTTP posts, a UDP blast, or something in between. Writing something to deal with dealing with a continuous stream of data requires a holistic approach – Flydog57 Apr 23 '21 at 18:51
  • It's all day from 8:30am CST to 3:00pm CST... on a TCP socket connection (using the SoupBinTCP protocol https://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/soupbintcp.pdf) – Brian Rice Apr 23 '21 at 18:55
  • Writing batch changes to the database might be a good idea... right now I seem to be getting about 5000 per second and able to write about a few hundred per second... not 100% sure on these numbers... but maybe working on the database aspect will be the most fruitful. – Brian Rice Apr 23 '21 at 19:02
  • How is the data in your SQL database being used/consumed? Do you really need a SQL data store? Can you serve the most recent price from an in-memory cache? – Eric J. Apr 23 '21 at 19:26
  • One thing to note: if you are doing 5000 operations per second, you need to think about your memory management. If are generating a lot of garbage per operation, your app will end up spending a lot of time in the GC, reducing your apps responsiveness and its real-time-ed-ness. Look into doing some of your memory management manually (buffer pools, etc.) to reduce the garbage you create per operation – Flydog57 Apr 23 '21 at 20:40
  • I have, in fact, paid a lot of attention to minimizing the allocations... mostly by using unsafe code and structures. – Brian Rice Apr 24 '21 at 06:52
  • I need to join tables with the prices... so I can implement sorting and paging by position value... I actually wrote a CLR that creates a "table" from an in-memory cache that I can do a join with... but Azure won't let me publish CLR's that do web requests. – Brian Rice Apr 24 '21 at 06:54

2 Answers2

0

You need to use something that enable you to query the stream, SQL is not the best tool for it. Search for Complex Event Processing and Kafka / Event hub + Stream Analytics.

Thiago Custodio
  • 17,332
  • 6
  • 45
  • 90
  • Problem is, I need to JOIN this information with other tables in a SQL query in order to do sorting and paging on the value of positions (quantity * lastPrice) – Brian Rice Apr 24 '21 at 06:57
0

The best I could do was the following approach... where I keep the last values in a dictionary and add a flag on whether it's been written to the database... and then make a pass through the data and write updated values to the database... that way I only update the most recently updated values. Works pretty well... seems like there should be a better way though.

    public void Start()
    {
        Task.Run(() => { UpdateLastPricesTask(services); });
        
        LastPriceCache.PriceReceived += (symbol, priceData) =>
        {
            _lastPrices.AddOrUpdate(symbol, priceData, (key, value) => { return priceData; });
        };
    }

    public ConcurrentDictionary<string, PriceData> _lastPrices = new ConcurrentDictionary<string, PriceData>();
    public bool _lastPriceUpdatesRunning;
    public DateTime _lastScopeCreate = DateTime.MinValue;
    public IServiceScope _lastPriceScope = null;

    private void UpdateLastPricesTask(IServiceProvider services)
    {
        _lastPriceUpdatesRunning = true;

        while (_lastPriceUpdatesRunning)
        {
            var processed = 0;

            foreach (var symbol in _lastPrices.Keys)
            {
                if (_lastPrices.TryGetValue(symbol, out QuoddLastPriceCache.PriceData priceData))
                {
                    if (priceData.WrittenToDatabase == false)
                    {
                        // create a new scope every 5 minutes
                        if (_lastPriceScope == null || _lastScopeCreate + TimeSpan.FromSeconds(60 * 5) < DateTime.UtcNow)
                        {
                            if (_lastPriceScope != null)
                                _lastPriceScope.Dispose();
                            _lastPriceScope = services.CreateScope();
                        }

                        // write to database
                        var unitOfWork = _lastPriceScope.ServiceProvider.GetRequiredService<IUnitOfWork>();
                        unitOfWork.SymbolPrice.UpdateLastPrice(symbol, priceData.Price, priceData.Timestamp);
                        priceData.WrittenToDatabase = true;

                        processed++;
                    }
                }
            }

            if (processed > 0)
                Thread.Sleep(1);
            else
                Thread.Sleep(1000 * 1);
        }
    }
Brian Rice
  • 3,107
  • 1
  • 35
  • 53