str_extract: Extracting exactly nth word from a string

Question

I know this question has been asked at several places, but I didnt see a precise answer to this.

So I am trying to extract exactly the 2nd word from a string("trying to") in R with the help of regex. I do not want to use unlist(strsplit)

sen= "I am trying to substring here something, but I am not able to"

str_extract(sen, "trying to\\W*\\s+((?:\\S+\\s*){2})")

Ideally I want to get "here" as an output, but I am getting "trying to substring here"

Why `here` is what you need to extract? It is not the 3rd word in a sentence. Do you want to extract a streak of non-whitespace chars after `trying to` + 1 or more whitespaces? — Wiktor Stribiżew, Aug 02 '17 at 14:40
@WiktorStribiżew.. sorry for the confusion, I have edited my que — Manu Sharma, Aug 02 '17 at 14:42
Try `str_match(sen, "trying to\\W+\\S+\\W+(\\S+)")[,2]` or `str_match(sen, "trying to\\s+\\S+\\s+(\\S+)")[,2]` — Wiktor Stribiżew, Aug 02 '17 at 14:42
@d.b, thanks for your response, but first I want to locate a specifc string in my sentence and from there i want to extract the exact second word. your solution seems not for that — Manu Sharma, Aug 02 '17 at 14:43

score 5 · Answer 1 · answered Aug 02 '17 at 14:55

5

Since you also tagged stringr, I will post the word solution,

library(stringr)

word(sub('.*trying to ', '', sen), 2)
#[1] "here"

answered Aug 02 '17 at 14:55

Sotos

51,121
6
32
66

This is a great solution. But when I have multiple spaces between words, this doesn't work. Because it returns whitespace as a word. – Rolando Tamayo Dec 06 '22 at 04:18

score 2 · Answer 2 · answered Aug 02 '17 at 14:45

2

We can use sub

sub("^.*\\btrying to\\s+\\w+\\s+(\\w+).*", "\\1", sen)
#[1] "here"

answered Aug 02 '17 at 14:45

akrun

874,273
37
540
662

score 2 · Accepted Answer · answered Aug 02 '17 at 14:48

You may actually capture the word you need with str_match:

str_match(sen, "trying to\\W+\\S+\\W+(\\S+)")[,2]

Or

str_match(sen, "trying to\\s+\\S+\\s+(\\S+)")[,2]

Here, \S+ matches 1 or more chars other than whitespace, and \W+ matches one or more chars other than word chars, and \s+ matches 1+ whitespaces.

Note that in case your "words" are separated with more than whitespace (punctuation, for example) use \W+. Else, if there is just whitespace, use \s+.

The [,2] will access the first captured value (the part of text matched with the part of the pattern inside the first unescaped pair of parentheses).

score 1 · Answer 4 · answered Aug 02 '17 at 14:43

1

You could use strsplit. First separate sen into two parts at "trying to " and then extract second word of the second part.

sapply(strsplit(sen, "trying to "), function(x) unlist(strsplit(x[2], " "))[2])
#[1] "here"

answered Aug 02 '17 at 14:43

d.b

32,245
6
36
77

score 1 · Answer 5 · answered Aug 16 '18 at 22:17

1

str_split is sometimes a popular choice. Call the nth word using [1,2], which returns the second word, [1,3] for the third, and so forth.

library(stringr)

#Data
sen= "I am trying to substring here something, but I am not able to"

#Code
str_split(sen, boundary("word"), simplify = T)[1,2]
#> [1] "am"

Created on 2018-08-16 by the reprex package (v0.2.0).

answered Aug 16 '18 at 22:17

Nettle

3,193
2
22
26

This is a good solution. But in the string "23. Qxe4+ Bf5 ({+2.57} 23" delete the "+" in the second word. – Rolando Tamayo Dec 06 '22 at 04:15

str_extract: Extracting exactly nth word from a string

5 Answers5

Linked

Related