5

I'm receiving financial data for approximately 5000 instruments every 5 seconds, and need to update the respective entries in the database. The model looks as follows:

class Market(models.Model):
    market = models.CharField(max_length=200)
    exchange = models.ForeignKey(Exchange,on_delete=models.CASCADE) 
    ask = models.FloatField()
    bid = models.FloatField()
    lastUpdate = models.DateTimeField(default = timezone.now)

What needs to happen is the following:

  • After new financial data is received, check if an entry exists in the database.
  • If the entry exists, update the ask, bid and lastUpdate fields
  • If the entry does not exist, create a new entry

My code looks as follows:

bi_markets = []
for item in dbMarkets:
    eItem = Market.objects.filter(exchange=item.exchange,market=item.market)
    if len(eItem) > 0:
        eItem.update(ask=item.ask,bid=item.bid)
    else:
        bi_markets.append(item)

#Bulk insert items that does not exist
Market.objects.bulk_create(bi_markets)  

However executing this takes way too long. Approximately 30 seconds. I need to reduce the time down to 1 second. I know this can be done as I do the same wth custom SQL code in .NET in under 100ms. Any idea how to improve the performance in Django?

ceds
  • 2,097
  • 5
  • 32
  • 50
  • 1
    Try adding `db_index=True` for `market` & `exchange` fields. – art May 01 '18 at 07:17
  • 2
    why not use `exists = Model.objects.filter(category='django').exists()` will help to reduce one check and try to fetch dbmatrkets using value.i mean like this `dbmarkets = Model.objects.value('value1','value2')`.This willl reduce your query time. – Abi Waqas May 01 '18 at 07:18
  • adding indexes did not increase performance – ceds May 01 '18 at 07:26
  • using exists() will add an additional layer of running time as 99.9% of the items will exist in the table – ceds May 01 '18 at 07:26
  • 1
    No it won't. It will be a (slight) optimisation over `len()` which calls `COUNT()`. – Daniel Roseman May 01 '18 at 08:11

2 Answers2

1

If it’s this kind of performance you’re going for, I don’t see why you wouldn’t just break out into raw SQL. Bulk creating things that don’t exist yet sounds like the advanced SQL querying that Django isn’t really made for.

https://docs.djangoproject.com/en/2.0/topics/db/sql/

You can also do (sorry on mobile):

bi_markets = []
for item in dbMarkets:
  rows = Market.objects.filter(exchange=item.exchange, market=item.market).update(ask=item.ask, bid=item.bid)
  if rows == 0:
    bi_markets.append(item)

Market.objects.bulk_create(bi_markets)

Maybe that combination will generate some better SQL and it sidesteps the exists() call as well (update returns how many rows it changed).

Alper
  • 3,424
  • 4
  • 39
  • 45
  • Thanks. The problem is not bulk inserting. It is checking if the items exists and updating it. If I delete all items and bulk insert it is fast enough, but the problem is the PK obviously changes every time. Will need 'bulk update' functionality where a different update for each item can be given – ceds May 01 '18 at 07:27
  • I expanded it and maybe this'll perform? At least it is much tidier. – Alper May 01 '18 at 09:43
-3

I've decided to split the update and create functionality. The create only happens when the app starts, from there on I do updates using custom SQL script. See below. Working great.

updateQ = []
updateQ.append("BEGIN TRANSACTION;")

for dbItem in dbMarkets:
    eItem = tickers[dbItem.market]
    qStr = "UPDATE app_market SET ask = " + str(eItem['ask']) + ",bid = " + str(eItem['bid']) + " WHERE exchange_id = " + str(e.dbExchange.pk) + " AND market = " + '"' + dbItem.market + '";'
    updateQ.append(qStr)

updateQ.append("COMMIT;")

updateQFinal  = ''.join(map(str, updateQ))

with connection.cursor() as cursor:
    cursor.executescript(updateQFinal)
ceds
  • 2,097
  • 5
  • 32
  • 50
  • 9
    Is this for real? At least read through the Django documentation on raw SQL and how they use %s to protect against injection attacks. – Alper May 01 '18 at 17:51