1

I'm using Django and Python 3.7. How do I scan for words in a Django query? A word is a string surrounded by whitespace (or the beginning or end of a line). I have this ...

def get_articles_with_words_in_titles(self, long_words):
    qset = Article.objects.filter(reduce(operator.or_, (Q(title__icontains=x) for x in long_words)))
    result = set(list(qset))

but if "long_words" contains things like ["about", "still"], it will match Articles whose titles have things like "whereabouts" or "stillborn". Any idea how to modify my query to incorporate word boundaries?

Dave
  • 15,639
  • 133
  • 442
  • 830
  • Possible duplicate of [Whole-word match only in Django query](https://stackoverflow.com/questions/14997536/whole-word-match-only-in-django-query) – Louis Aug 27 '19 at 19:57

2 Answers2

4

If you database is postgres, I suggest to try the Full Text Search of postgres.

And it seems that django has builtin module of it.

from django.contrib.postgres.search import SearchVector, SearchQuery

search_vector = SearchVector('title')
search_query = SearchQuery('about') & SearchQuery('still')

Article.objects.annotate(
  search=search_vector
).filter(
  search=search_query
)
zxdvd
  • 428
  • 4
  • 15
0

Try iregex or regex

# Article.objects.filter(title__iregex=r"\y(still|about)\y")
words = "|".join(long_words)
Article.objects.filter(title__iregex=fr"\y({words})\y")

This should work for PostgreSQL

Django documentation:

https://docs.djangoproject.com/en/2.2/ref/models/querysets/#iregex

Python's regular expression documentation for word boundaries:

https://docs.python.org/3.7/library/re.html#index-26

PostgreSQL's documentation on word boundaries: https://www.postgresql.org/docs/9.1/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

rabbit.aaron
  • 2,369
  • 16
  • 25
  • if you're using MySQL, according to this post: https://stackoverflow.com/questions/20001111/django-filter-iregex-to-match-complete-word-only you might need to look up MySQL's way of doing word boundaries. – rabbit.aaron Aug 25 '19 at 14:32
  • I'm using PostGres but even so you're statement isn't working. The query builds things like '"article"."title"::text ~* \\b(canada|woman|properly)\\b' but that isn't returning any results when a hard-coded query like "title like '% canada %'" by itself returns results. – Dave Aug 25 '19 at 17:58
  • Hi Dave, I've updated my answer, seems like you need to use the Postgres' regex flavour to define the word boundaries. – rabbit.aaron Aug 25 '19 at 23:02
  • Thanks. Are you sure its "\y"? Python is telling me "Illegal/unsupported escape sequence." – Dave Aug 25 '19 at 23:20
  • @Dave yes, I tried it in my app. Django does not translate python regex to Postgres’ – rabbit.aaron Aug 25 '19 at 23:21
  • @Dave are u getting a warning in your IDE or is it an exception? If it is an exception, you might need to remove r from before the string, and use `f“\\y({words})\\y”` instead. – rabbit.aaron Aug 25 '19 at 23:24
  • It was my IDE -- PyCharm, but I ignored the IDE errors and just fired things up and things do appear to be working. Will run a few more tests and then will come back and accept. Thx! – Dave Aug 25 '19 at 23:50