4

So I've got a query for a search endpoint in my Django (postgres) backend like so:

widgets = Widgets.objects.\
    filter(user=request.user).\
    filter(title__icontains="marketing director").\
    distinct('url')[:250]

title is a CHAR field (models.CharField(max_length=255, blank=True) in django).

contains here of course amounts to a '%LIKE%' query in Postgres. I want specifically this (ie: NOT an infix 'LIKE%' search).

I'd like to speed up that query.

It seems like most optimizations for postgres text search are only for TEXT fields - is there any way to speed up exact string searches on a CHAR column?

I could use Postgres trigram indices, but I don't actually need the fuzzy/mispelling-type search. Though I'd happily use it if it's actually just faster for some reason.

Or would I be better off converting those columns to TEXT, taking the storage increase hit, and better indexing them somehow?

lollercoaster
  • 15,969
  • 35
  • 115
  • 173
  • You should read https://www.postgresql.org/docs/current/static/indexes-types.html and https://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations. – Ilja Everilä Jun 05 '17 at 06:13
  • Those are great, but I didn't see a complete performance breakdown, nor what was strictly the fastest. – lollercoaster Jun 05 '17 at 06:24
  • You should include some examples of the data you're performing your queries against and what `data['query']` actually may contain. This is because for example the mentioned text search features work on documents of words/lexemes, but you're performing pattern matching against a string currently. – Ilja Everilä Jun 05 '17 at 06:35
  • @IljaEverilä done – lollercoaster Jun 05 '17 at 06:48

1 Answers1

2

The only way so use an index with a LIKE pattern that starts with % (is not anchored at the beginning) is to use a trigram index.

It does not matter if you define the attributes as character or text, since they will be converted to text anyway.

Do your data contain a lot of trailing blanks? Other than that, I cannot see how text could waste storage space.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • OK, interesting. And no, not very many trailing blanks. What is the performance speed up (generally speaking) for a trigram index over a vanilla LIKE query? Does the speed-up vary linearly with the number of rows added in the database? – lollercoaster Jun 05 '17 at 07:04
  • The cost of index access usually grows logarithmically with the number of rows. With a trigram index, you get bad results with short search strings, because they result in a great number of false positive hits. With strings the size of your example it should work fine. – Laurenz Albe Jun 05 '17 at 07:09
  • So in your estimation, a trigram index will be faster than a vanilla LIKE query? – lollercoaster Jun 06 '17 at 21:01
  • Yes, unless the search pattern is very short. – Laurenz Albe Jun 07 '17 at 07:35