4

I have the following code in Python:

import re
string = "what are you doing you i just said hello guys"
regexValue = re.compile(r'(\s\w\w\w\s)')
mo = regexValue.findall(string)

My goal is to find any 3 letter word, but for some reason I seem to only be getting the "are" and not the "you" in my list. I figured this might be because the space between the two overlap, and since the space is already used it cannot be a part of "you". So, how should I find only three letter words from a string like this?

  • 1
    @UlfGjerdingen if I understand OP correctly he wants to catch another **you** also – dnit13 Jun 09 '16 at 14:08
  • Yeah, but there is still one "you" missing. I added another "you" in the string, only so I could try if the overlapping space was the problem. Which it seemed to be, so the output should be 2 "you". – Linus Johansson Jun 09 '16 at 14:09

3 Answers3

8

It's not regex, but you could do this:

words = [word for word in string.split() if len(word) == 3]
Morgan Thrapp
  • 9,748
  • 3
  • 46
  • 67
6

You should use word boundary (\b\w{3}\b) if you strictly want to use regex otherwise, answer suggested by Morgan Thrapp is good enough for this.

Demo

dnit13
  • 2,478
  • 18
  • 35
  • 1
    And this syntax is indeed supported by Python (which the link doesn't seem to directly demonstrate.) This is the best regex-based solution IMHO. – Nick Matteo Jun 09 '16 at 14:20
1

findall finds non-overlapping matches. An easy fix is to change the final \s to a lookahead; (?=\s) but you'll probably also want to extend the regex to cope with initial and final matches as well.

regexValue = re.compile(r'((?:^\s)\w\w\w(?: $|(?=\s))')

If this is not a regex exercise, splitting the string on whitespace is much mose straightforward.

tripleee
  • 175,061
  • 34
  • 275
  • 318