consecutive matches in regex (R)

Question

I'm trying to write a regex expression (under R) that matches all the words containing 3 letters in this text:

tex= "As you are now so once were we"

My first attempt is to select words containing 3 letters surrounded by spaces:

matches=str_match_all(tex," [a-z]{3} ")

It's supposed to match " you ", " are " and " now ". But, since some of these spaces are shared between the matched strings, I only get " you " and " now ".

Is there a way to fix this issue ?

Thanks in advance

akrun · Answer 1 · 2016-08-28T07:22:44.683

3

It may be better to use a word boundary (\\b)

library(stringr)
str_match_all(tex,"\\b[a-z]{3}\\b")[[1]]
#   [,1] 
#[1,] "you"
#[2,] "are"
#[3,] "now"

Or we can also use str_extract

str_extract_all(tex,"\\b[a-z]{3}\\b")[[1]]
#[1] "you" "are" "now"

edited Aug 28 '16 at 07:22

answered Aug 28 '16 at 07:11

akrun

874,273
37
540
662

1

I only added `library(stringr)`. Didn't remove aything - I promise :-) – RHertel Aug 28 '16 at 07:17
@RHertel Thanks, then it got removed automatically. – akrun Aug 28 '16 at 07:19
Looking at the revision history it seems like I did remove something, but I actually did not. It must be some kind of bug, presumably we were editing the post at the same time and the result was conflicting. – RHertel Aug 28 '16 at 07:20
@RHertel Yes, that seems like the logical conclusion. – akrun Aug 28 '16 at 07:21

score 0 · Answer 2 · answered Dec 05 '17 at 15:05

0

 tex= "As you are now so once were we"

Base R function

regmatches(tex , gregexpr('\\b[a-z]{3}\\b' , tex))[[1]]

 [1] "you" "are" "now"

answered Dec 05 '17 at 15:05

dondapati

829
6
18

Idloj · Answer 3 · 2016-08-28T08:15:28.820

-1

Try this:

\b[a-zA-Z]{3}\b

This works because \b doesn't match the whitespace/punctuation itself, but rather the position of the word boundary, so the spaces are not included in the match.

You also want to include A-Z in the character range to include uppercase letters.

This was taken from the examples in http://regexr.com/, they have a "4 letter words" example.

edited Aug 28 '16 at 08:15

answered Aug 28 '16 at 08:07

Idloj

109
1
6

consecutive matches in regex (R)

3 Answers3