0

If I have two tables:

items
    Id VARCHAR(26)
    CreateAt bigint(20)
    Type VARCHAR(26)

expiry
    Id VARCHAR(26)
    Expiry bigint(20)

The items table contains when the item was created, and what type it is. Then another table, expiry, is a lookup table to say how long certain types should last for. A query is run every day to make sure that items that have expired are removed.

At the moment this query is written in our app, as programming code:

for item in items {
    expiry = expiry.get(item.Type)
    if (currentDate() - expiry.Expiry > item.CreateAt) {
        item.delete()
    }
}

This was fine when we only had a few thousand items, but now we have tens of millions it takes a significant amount of time to run. Is there a way to put this into just an SQL statement?

Salman A
  • 262,204
  • 82
  • 430
  • 521
GCHQ77703
  • 23
  • 5
  • @SalmanA Yes, they do. The `Id` field on expiry is the primary key, which is referenced within the `Type` field on `items. – GCHQ77703 Oct 09 '19 at 11:48
  • Wait, but in terms of the tables themselves, they aren't one-to-one. There will only be one expiry per item, but there may be thousands of items per type. – GCHQ77703 Oct 09 '19 at 11:49
  • 2
    MySQL and postgresql are two different RDBMS products with different implementations of SQL. I removed the conflicting product tags. Pls add the one back that you use. – Shadow Oct 09 '19 at 11:53
  • @Shadow: it's probably MySQL, because `bigint(20)` is invalid for Postgres –  Oct 09 '19 at 12:08

3 Answers3

1

Assuming all date values are actually UNIX timestamps, you could write a query such as:

SELECT * -- DELETE
FROM items
WHERE EXISTS (
    SELECT 1
    FROM expiry
    WHERE expiry.id = items.type
    AND items.CreateAt + expiry.Expiry < UNIX_TIMESTAMP()
)

Replace SELECT with DELETE once you're sure that the query selects the correct rows.

Salman A
  • 262,204
  • 82
  • 430
  • 521
0

If the dates stored are in seconds since the UNIX epoch, you could use this PostgreSQL query:

DELETE FROM items
USING expiry
WHERE items.type = expiry.id
  AND items.createat < EXTRACT(epoch FROM current_timestamp) - expiry.expiry;

A standard SQL solution that should work anywhere would be

DELETE FROM items
WHERE items.createat < EXTRACT(epoch FROM current_timestamp)
                       - (SELECT expiry.expiry FROM expiry
                          WHERE expiry.id = items.type);

That can be less efficient in PostgreSQL.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
0

Your code is getting slow because you do the join between the tables outside the database.

Second slowing aspect is that you delete the items 1 by 1. So using the compact delete statements which were provided is the correct solution.

It seems that you are using something like python-sqlalchemy. There the code would be something like:

items.delete().\
    where(items.c.type==\
        select([expiry.c.id]).\
             where(currentDate() - expiry.Expiry > item.c.CreateAt ))
Thomas Strub
  • 1,275
  • 7
  • 20