0

This should be easy, so maybe I've fried my brain, but I can't get this to work. I have a database with over a million rows. I want to add about 15k more rows, but some of the data might already exist in the database.

My basic process is this:

async def process_data(object_list):
    # Load pre-existing data from db:
    pks = await copy_pks()
    # Remove duplicates:
    filtered_data = [obj for obj in object_list if obj.id not in pks]
    # Add unique values:
    await copy_in(filtered_data)

My copy_pks method:

@with_connection:
async def copy_pks(self, connection):
    pks = set()
    async with connection.cursor().copy(sql) as cursor:
        async for row in cursor:
            res = await cursor.read_row()
            pks.add(int(res[0]))
    return pks

My copy_in method:

@with_connection
async def copy_in(self, connection, data):
    async with connection.cursor().copy(sql) as cursor:
        for obj in data:
            await cursor.write_row(data)

The problem is that I keep hitting duplicate key value violates unique constraint despite my attempt to filter. I've checked the length of object_list and the length of filtered_data and the latter is about half the size... so clearly something is getting filtered.

I then checked specifically for the offending object:

for obj in object_list:
    if obj.id == 14856714938:
        print("Found")
        if obj.id in pks:
            print("In pks")
        if obj.id in filtered_data:
            print("In filtered")

I receive "Found", but I don't receive "In pks" and I don't receive "In filtered."

Unless I'm blind to some syntax or logic error here, my best shot in the dark is that it's a RAM issue, given the size of the data?

  • 1
    And you don't think all those `async` and `await` calls are part of the problem. Look at the Postgres logs, I'm betting things are being done out of order. – Adrian Klaver Feb 01 '22 at 22:57
  • So I wrote a non-async version of copy_pks method and still the same problem. Unless the function tries to execute the `await copy_in()` line before it runs the prior two syncornous lines, I don't see how that could be the problem. – AfterVirtue Feb 01 '22 at 23:20
  • Either way the answer is probably in the logs. There is what you think you are doing and what is actually happening. The logs will help you resolve the two. – Adrian Klaver Feb 01 '22 at 23:37

0 Answers0