1

I'm currently trying to use the pg_trgm operations % and <->. The GIN Indices on the columns are already available, but I can't find the sqlalchemy equivalent to the previously mentioned operators.

What would be the best approach to solve this problem, except writing a pure text query.

A simple example query would be:

tag = test
tag_subq = session.query(sticker_tag.file_id, f'sticker_tag.name <-> {tag}'.label(distance)) \
    .filter(f'sticker_tag.name % {tag}')) \
    .filter('distance' < 0.3) \
    .subquery("tag_subq")

The query above is obviously not working, and the select and filter string are just placeholder to visualize what I intend to do.

Nukesor
  • 945
  • 1
  • 7
  • 13

3 Answers3

7

you can use the Operators.op() method; this generates whatever operator you need:

sticker_tag.name.op('<->')(tag)
sticker_tag.name.op('%%')(tag)

The percent operator is doubled up to escape it because %foo or %(foo) syntax is used by python dbapi to insert parameters into the query.

cmc
  • 4,294
  • 2
  • 35
  • 34
r-m-n
  • 14,192
  • 4
  • 69
  • 68
6

For people who are using Postgres, it's possible to use similarity to do so instead.

NOTE: Do remember to install the pg_trgm extension in your Postgres first: CREATE EXTENSION pg_trgm;

Here's an example in using SQLAlchemy:

# ... other imports
from sqlalchemy import and_, func, or_

def search_store_product(search_string: str) -> Optional[list[Product]]:
    try:
        return session.query(Product).filter(
            or_(
                func.similarity(Product.name, search_string) > 0.6,
                func.similarity(Product.brand, search_string) > 0.4,
            ),
            and_(Product.updated_on >= datetime.utcnow() - timedelta(days=5)),
        ).order_by(Product.created_on).limit(20).all()

    except ProgrammingError as exception:
        logger.exception(exception)
        raise

    finally:
        session.close()
Jerry
  • 186
  • 1
  • 6
  • 13
  • Postgres doesn't use indexes for functions. It uses indexes only for operators and `similarity(name, str)` will be very slow. it is better using `Product.name.op('%>')(word)` – Gosha null May 11 '22 at 10:26
  • @Goshanull your example works good, I'm not sure what kind of operator is "%>". – Mark Mishyn Mar 21 '23 at 15:15
  • @MarkMishyn There is a list of pg_trgm operators https://www.postgresql.org/docs/current/pgtrgm.html – Gosha null Mar 22 '23 at 16:22
1

If anyone is interested, I did some tests comparing the % method and the similarity(...) > x method and there is a significant speed up using %. Over 10x in some cases.

SELECT * FROM X WHERE name % 'foo';

is way faster than

SELECT name FROM x WHERE similarity(name, 'foo') > 0.7;

So I recommend only using the similarity(..) function in the SELECT statement if it's relevant to your query. Like this:

SELECT name, similarity(name, 'foo') FROM X WHERE name % 'foo';

But you need to set pg_trgm.similarity_threshold before using % because the default is 0.3 which in my opinion is way too fuzzy and slow for most applications. So r-m-n's answer is preferable, just remember to set similarity_threshold every session!

In SQL Alchemy that'll be something like this:

db.session.execute('SET pg_trgm.similarity_threshold = 0.7;')
items = Model.query.filter(Model.name.op("%")(name)).all()
Cooper Moss
  • 107
  • 1
  • 6