Trigram distance operations in sqlalchemy

Question

I'm currently trying to use the pg_trgm operations % and <->. The GIN Indices on the columns are already available, but I can't find the sqlalchemy equivalent to the previously mentioned operators.

What would be the best approach to solve this problem, except writing a pure text query.

A simple example query would be:

tag = test
tag_subq = session.query(sticker_tag.file_id, f'sticker_tag.name <-> {tag}'.label(distance)) \
    .filter(f'sticker_tag.name % {tag}')) \
    .filter('distance' < 0.3) \
    .subquery("tag_subq")

The query above is obviously not working, and the select and filter string are just placeholder to visualize what I intend to do.

score 7 · Accepted Answer · edited Apr 03 '20 at 07:45

7

you can use the Operators.op() method; this generates whatever operator you need:

sticker_tag.name.op('<->')(tag)
sticker_tag.name.op('%%')(tag)

The percent operator is doubled up to escape it because %foo or %(foo) syntax is used by python dbapi to insert parameters into the query.

edited Apr 03 '20 at 07:45

cmc

4,294
2
35
34

answered Oct 24 '18 at 11:10

r-m-n

14,192
4
69
68

score 6 · Answer 2 · answered Aug 07 '21 at 04:45

For people who are using Postgres, it's possible to use similarity to do so instead.

NOTE: Do remember to install the pg_trgm extension in your Postgres first: CREATE EXTENSION pg_trgm;

Here's an example in using SQLAlchemy:

# ... other imports
from sqlalchemy import and_, func, or_

def search_store_product(search_string: str) -> Optional[list[Product]]:
    try:
        return session.query(Product).filter(
            or_(
                func.similarity(Product.name, search_string) > 0.6,
                func.similarity(Product.brand, search_string) > 0.4,
            ),
            and_(Product.updated_on >= datetime.utcnow() - timedelta(days=5)),
        ).order_by(Product.created_on).limit(20).all()

    except ProgrammingError as exception:
        logger.exception(exception)
        raise

    finally:
        session.close()

Postgres doesn't use indexes for functions. It uses indexes only for operators and `similarity(name, str)` will be very slow. it is better using `Product.name.op('%>')(word)` — Gosha null, May 11 '22 at 10:26
@Goshanull your example works good, I'm not sure what kind of operator is "%>". — Mark Mishyn, Mar 21 '23 at 15:15
@MarkMishyn There is a list of pg_trgm operators https://www.postgresql.org/docs/current/pgtrgm.html — Gosha null, Mar 22 '23 at 16:22

Cooper Moss · Answer 3 · 2022-02-16T18:54:23.260

If anyone is interested, I did some tests comparing the % method and the similarity(...) > x method and there is a significant speed up using %. Over 10x in some cases.

SELECT * FROM X WHERE name % 'foo';

is way faster than

SELECT name FROM x WHERE similarity(name, 'foo') > 0.7;

So I recommend only using the similarity(..) function in the SELECT statement if it's relevant to your query. Like this:

SELECT name, similarity(name, 'foo') FROM X WHERE name % 'foo';

But you need to set pg_trgm.similarity_threshold before using % because the default is 0.3 which in my opinion is way too fuzzy and slow for most applications. So r-m-n's answer is preferable, just remember to set similarity_threshold every session!

In SQL Alchemy that'll be something like this:

db.session.execute('SET pg_trgm.similarity_threshold = 0.7;')
items = Model.query.filter(Model.name.op("%")(name)).all()

Trigram distance operations in sqlalchemy

3 Answers3