I'm writing a web application that is going to show player statistics for an online game, using Django 1.6 and PostgreSQL 9.1. I've created a script using django-extensions "runscript" which fetches all players that are online and insert/updates into my table. This script is executed 4 times per hour using cron. I need to either insert or update since the player already could be in the table (and thus should be updated) or not be in the table.
To my problem: there is around 25,000 players online at peak hours and I'm not really sure how I should optimize this (minimize hdd i/o). This is how I've done so far:
@transaction.commit_manually
def run():
for fetched_player in FetchPlayers():
defaults = {
'level': fetched_player['level'],
'server': fetched_player['server'],
'last_seen': date.today(),
}
player, created = Player.objects.get_or_create(name=fetched_player['name'], defaults)
if not created:
player.level = fetched_player['level']
if player.server != fetched_player['server']:
# I save this info to another table
player.server = fetched_player['server']
player.last_seen = date.today()
player.save()
transaction.commit()
Would it (considerably) faster to bypass Django and access the database using psycopg2 or similar? Would Django be confused when 'someone else' is modifying the database? Note that Django only reads the database, all writes are done by this script.
What about to (using either Django or psycopg2) bulk fetch players from the database, update those that was found, and then insert those players that were not found? If this is even possible? The query would get huge: 'SELECT * FROM player WHERE name = name[0] OR name = name[1] OR ... OR name[25000]'. :)