It seems like there is a strange performance hit when running a query that includes both NOT 'some string' = ANY(array_column)
as well as an ORDER BY
statement in the same query.
The following is a simplified table structure illustrating the behavior where tagger
is an array of UUIDs (v4):
CREATE TABLE IF NOT EXISTS "doc"."test" (
"id" STRING,
"last_active" TIMESTAMP,
"taggers" ARRAY(STRING)
)
The taggers
array can grow somewhat large with maybe hundreds and in some case thousands of individual strings.
The following queries are all very performant and resolve within .03 seconds:
SELECT id FROM test ORDER BY last_active DESC LIMIT 10;
SELECT id FROM test WHERE NOT ('da10187a-408d-4dfc-ae46-857fd23a574a' = ANY(taggers)) LIMIT 10;
SELECT id FROM test WHERE ('da10187a-408d-4dfc-ae46-857fd23a574a' = ANY(taggers)) ORDER BY last_active DESC LIMIT 10;
However including both parts in the query jumps to around 2 - 3 seconds:
SELECT id FROM test WHERE NOT ('da10187a-408d-4dfc-ae46-857fd23a574a' = ANY(taggers)) ORDER BY last_active LIMIT 10;
What's very strange is that of the previous list of queries that run fast the last one is almost the exact same as the slow one, just without the negation. Negation of the ANY
is also very fast. It's only when negation of ANY in a combination of a limit is added that things slow down. Any help would be greatly appreciated.