0

It seems like there is a strange performance hit when running a query that includes both NOT 'some string' = ANY(array_column) as well as an ORDER BY statement in the same query.

The following is a simplified table structure illustrating the behavior where tagger is an array of UUIDs (v4):

CREATE TABLE IF NOT EXISTS "doc"."test" (
   "id" STRING,
   "last_active" TIMESTAMP,
   "taggers" ARRAY(STRING)
)

The taggers array can grow somewhat large with maybe hundreds and in some case thousands of individual strings.

The following queries are all very performant and resolve within .03 seconds:

SELECT id FROM test ORDER BY last_active DESC LIMIT 10;


SELECT id FROM test WHERE NOT ('da10187a-408d-4dfc-ae46-857fd23a574a' = ANY(taggers)) LIMIT 10;


SELECT id FROM test WHERE ('da10187a-408d-4dfc-ae46-857fd23a574a' = ANY(taggers)) ORDER BY last_active DESC LIMIT 10;

However including both parts in the query jumps to around 2 - 3 seconds:

 SELECT id FROM test WHERE NOT ('da10187a-408d-4dfc-ae46-857fd23a574a' = ANY(taggers)) ORDER BY last_active LIMIT 10;

What's very strange is that of the previous list of queries that run fast the last one is almost the exact same as the slow one, just without the negation. Negation of the ANY is also very fast. It's only when negation of ANY in a combination of a limit is added that things slow down. Any help would be greatly appreciated.

Cœur
  • 37,241
  • 25
  • 195
  • 267

1 Answers1

1

The query with only ORDER BY doesn't apply any filtering and it's much faster of course.

The query that only has the filtering NOT ...ANY() without ORDER BY applies the filter only to a short number of records until the LIMIT number (10 in this case) is reached.

The last query (filtering with NOT & ANY and ORDER BY) is significantly slower because it has to do much more work: It has to apply the filter on all records of the table, then sort them and finally return the first 10 (LIMIT).