12

I am trying to write a Django query that will only match whole words. Based on the answer here, I've tried something like:

result = Model.objects.filter(text__iregex='\bsomeWord\b')

But this isn't returning the expected result. I also tried

result = Model.objects.filter(text__iregex=r'\bsomeWord\b')

to no avail. My end goal is to be able to pass in a string variable as well, something like:

result = Model.objects.filter(text__iregex=r'\b'+stringVariable+r'\b')

or

result = Model.objects.filter(text__iregex=r'\b %s \b'%stringVariable)

But right now I can't even get it to work with a raw string. I'm using PostgreSQL.

Louis
  • 146,715
  • 28
  • 274
  • 320
GChorn
  • 1,267
  • 1
  • 19
  • 36

3 Answers3

26

Use “\y” instead of “\b” when you're using PostgreSQL, this is because Django passes your regular expression straight down to PostgreSQL – so your RegEx's need to be compatible with it. You should be able to execute them from psql without any problems.

result = Model.objects.filter(text__iregex=r"\y{0}\y".format(stringVariable))

See https://www.postgresql.org/docs/9.1/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

sazzad
  • 5,740
  • 6
  • 25
  • 42
Matt
  • 8,758
  • 4
  • 35
  • 64
  • 1
    Django doesn't do any translating of PCREs to PostgreSQL regular expressions (see: http://bit.ly/ZtpbgH). So you just have to use PostgreSQL's native regular expressions, see here: http://bit.ly/ZtpojU – Matt Feb 21 '13 at 08:58
  • This works. Thanks for adding the explanation about `\y` as well. As for the `'{0}'.format(stringVariable)` part, is that just another way of writing `'%s'%stringVariable`? Both seem to work for me. – GChorn Feb 21 '13 at 09:08
  • 4
    Yeah, effectively they both do the same. I just use the `.format()` way because Python 3 requires it (http://www.python.org/dev/peps/pep-3101/). – Matt Feb 21 '13 at 09:10
0

You might be able to get something by dropping the regex and using a few django lookups

result = Model.objects.filter(Q(text__contains=' someword ') |
                              Q(text__contains=' someword.') |
                              Q(text__istartswith = 'someword.' |
                              Q(text__istartswith = 'someword.' |
                              Q(text__iendswith = 'someword')

see here for docs.

I realize that's not so elegant (but makes for easy maintenance if you're not a fan of regex).

Aidan Ewen
  • 13,049
  • 8
  • 63
  • 88
  • 1
    Because this doesn't perform a whole-word match; for instance, if I do a `text__contains='tart'` I pick up results like `start`. – GChorn Feb 21 '13 at 08:31
  • my bad - I'm obviously not familiar with sql enough to know that 'whole word match' meant something specific. Anyway, I've updated my answer to improve it a bit. – Aidan Ewen Feb 21 '13 at 08:54
  • 1
    Easy maintenance? There are lots of cases you're not covering there (between `()`, commas, etc). – Oscar Mederos Feb 21 '13 at 08:56
  • Easy maintenance if you're not a fan of regex. You don't need to know django or python to read this - but I'd have to reference any regex that solves this problem. And yes, this doesn't catch everything - It would only work if you're confident that text contains only a subset of the language. – Aidan Ewen Feb 21 '13 at 09:03
0

I had the same problem trying to match word boundaries using the Perl-compatible escape sequence \b. My backend database is MySQL.

I solved the problem by the character class expression [[:space:]], e.g.

        q_sum = Q()
        search_list = self.form.cleaned_data['search_all'].split(' ');
        for search_item in search_list:
            search_regex = r"[[:space:]]%s[[:space:]]" % search_item
            q_sum |= Q(message__iregex=search_regex)
        queryset = BlogMessages.objects.filter(q_sum).distinct()
Stefan Musarra
  • 1,429
  • 14
  • 16