3

I am trying to make a subquery in a DELETE statement in order to improve performance. The plain DELETE statement works but the subquery ones are either deleting all rows indiscriminately, or only one row per invocation. I am confused as to why the statements are not equivalent.

Jump down to "What doesn't work" to see the problem statements.

Upfront info

I am using sqlite3 from python2 to manage a database of pictures with associated tags.

The schema for the table is:

CREATE VIRTUAL TABLE "files" USING fts3(fname TEXT, orig_name TEXT, tags TEXT, md5sum TEXT);

Tags are organized as a comma separated list so direct string comparison in sqlite doesn't (easily) work, so I've add a helper function TAGMATCH

def tag_match(tags, m):
    i = int(m in [i.strip() for i in tags.split(',')])
    return i
db.create_function('TAGMATCH', 2, tag_match)

What works

This does what I want/expect. It deletes all rows where the column tags contains the tag 'DELETE'. Downside is, as far as I understand, this requires a linear scan of the table. Because of the "dangers" of deleting something from the table I did want to use MATCH in case, in some hypothetical situation, a match occurs with another unintended tag ie. 'DO NOT DELETE THIS'.

DELETE FROM files WHERE TAGMATCH(tags, 'DELETE')


What doesn't work

To speed things up I tried a trick I read in another stackoverflow post where MATCH is used to narrow down the search, then a direct string comparison is done on those results ie

SELECT * FROM (SELECT * FROM table WHERE words MATCH keyword) WHERE words = keyword

I tried using this trick here, but instead it deletes every row in the table.

DELETE FROM files WHERE TAGMATCH((
        SELECT tags FROM files WHERE tags MATCH 'DELETE'), 'DELETE')

This is what I first came up with. I realize now it's not a particularly good solution, but since its effect puzzles me I'm including it. This statement only deletes one row containing the tag 'DELETE'. If invoked again, it deletes another row, and so on until all the rows with 'DELETE' are removed:

DELETE FROM files WHERE rowid = (
        SELECT rowid FROM (
            SELECT rowid, tags FROM files WHERE tags MATCH 'DELETE')
    WHERE TAGMATCH(tags, 'DELETE'))
Community
  • 1
  • 1
chew socks
  • 1,406
  • 2
  • 17
  • 37

1 Answers1

4

Following query deletes everything because the WHERE clause evaluates to a number which if by itself evaluates to TRUE:

DELETE FROM files WHERE TAGMATCH((
        SELECT tags FROM files WHERE tags MATCH 'DELETE'), 'DELETE')

Equivalent to

DELETE FROM files WHERE 1  -- or whatever ##

Instead, consider using EXISTS with subquery that correlates to main query:

DELETE FROM files WHERE EXISTS
    (SELECT 1 
     FROM (SELECT rowid, tags FROM files WHERE tags MATCH 'DELETE') sub 
     WHERE TAGMATCH(sub.tags, 'DELETE') AND sub.rowid = files.rowid)

Alternatively, using your attempt, turn = into an IN as former only uses the first record found.

DELETE FROM files WHERE rowid IN
    (SELECT rowid 
     FROM (SELECT rowid, tags FROM files WHERE tags MATCH 'DELETE') sub
     WHERE TAGMATCH(sub.tags, 'DELETE'))
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank! That makes sense. What does the `sub` marker do on the third line of each block?. – chew socks Nov 22 '17 at 21:10
  • It is a table alias (can be anything) needed to differentiate between main query and sub query since they both use same underlying table, *files*, really needed for `EXISTS` query. – Parfait Nov 23 '17 at 01:22
  • Oh! I missed the part where you used `sub.tags` instead of just `tags`. I noticed it works without it in the `IN` query, would it be a performance thing there? – chew socks Nov 24 '17 at 22:57
  • Probably not. It's alway good practice to alias subqueries as some RDMBS require it. SQLite's engine may internally alias the virtual table if user does not give it. – Parfait Nov 25 '17 at 01:47