1

I'm trying to write a regex expression (under R) that matches all the words containing 3 letters in this text:

tex= "As you are now so once were we"

My first attempt is to select words containing 3 letters surrounded by spaces:

matches=str_match_all(tex," [a-z]{3} ")

It's supposed to match " you ", " are " and " now ". But, since some of these spaces are shared between the matched strings, I only get " you " and " now ".

Is there a way to fix this issue ?

Thanks in advance

Pj_
  • 824
  • 6
  • 15
Blofeld
  • 53
  • 5

3 Answers3

3

It may be better to use a word boundary (\\b)

library(stringr)
str_match_all(tex,"\\b[a-z]{3}\\b")[[1]]
#   [,1] 
#[1,] "you"
#[2,] "are"
#[3,] "now"

Or we can also use str_extract

str_extract_all(tex,"\\b[a-z]{3}\\b")[[1]]
#[1] "you" "are" "now"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I only added `library(stringr)`. Didn't remove aything - I promise :-) – RHertel Aug 28 '16 at 07:17
  • @RHertel Thanks, then it got removed automatically. – akrun Aug 28 '16 at 07:19
  • Looking at the revision history it seems like I did remove something, but I actually did not. It must be some kind of bug, presumably we were editing the post at the same time and the result was conflicting. – RHertel Aug 28 '16 at 07:20
  • @RHertel Yes, that seems like the logical conclusion. – akrun Aug 28 '16 at 07:21
0
 tex= "As you are now so once were we"

Base R function

regmatches(tex , gregexpr('\\b[a-z]{3}\\b' , tex))[[1]]

 [1] "you" "are" "now"
dondapati
  • 829
  • 6
  • 18
-1

Try this:

\b[a-zA-Z]{3}\b

This works because \b doesn't match the whitespace/punctuation itself, but rather the position of the word boundary, so the spaces are not included in the match.

You also want to include A-Z in the character range to include uppercase letters.

This was taken from the examples in http://regexr.com/, they have a "4 letter words" example.

Idloj
  • 109
  • 1
  • 6