Grep words before and after match?

Question

Is there a simple solution to also extract one word before and one word after along with the matched word? For example, assuming the following text

... put returns between paragraphs ...
... the function returns void ...

the search for returns should return

put returns between
function returns void

I am not a bash expert, but could put together the following

grep -o -P "(?:[a-zA-Z'-]+[^a-zA-Z'-]+){1}returns(?:[^a-zA-Z'-]+[a-zA-Z'-]+){1}" TEXT.FILE

but not sure if this catches all.

Is there a case when the text "returns" does not have a word before or after? — fedorqui, Nov 30 '16 at 09:50
@WiktorStribiżew note there is no need to use `-P` for this: `-E` suffices. — fedorqui, Nov 30 '16 at 09:50
@WiktorStribiżew yep, `-P` is for `P`erl regexps, which are more expensive than the `E`xtended ones. Of course, for a tiny file this shouldn't matter at all but it is good practice to "just" use the needed extension. Also, `-E` is specified by POSIX, while `-P` _is highly experimental and grep -P may warn of unimplemented features_ (from `man grep`). — fedorqui, Nov 30 '16 at 09:53
@fedorqui: I do not think there should be any difference when using such basic patterns. Is there any link to `grep` performance tests with identical, simple PCRE and extended patterns? I could not find any :( — Wiktor Stribiżew, Nov 30 '16 at 09:59
@WiktorStribiżew Well, as always it is a matter of using resources as they are needed: we can say `cat file | grep '1'` or `grep '1' file`. They are the same and the performance won't differ much, but it is good to know that `grep` alone can read the file. In this case, opening the Perl Regexp engine sounds like too much when the Extended can already handle it. Also, note the POSIX part I mentioned later in my comment, which is also relevant. — fedorqui, Nov 30 '16 at 10:08
@Wiktor I too had my doubts that it should be any slower to use `-P` so I just ran a pretty unscientific test `time grep -E '\w+' <(seq 1000000) > /dev/null` and it took ~0.55s, whereas the same using `-P` took ~0.85s. — Tom Fenech, Nov 30 '16 at 10:25
@TomFenech: Thank you, so that means using the right regex flavor *is* of importance with `grep`. — Wiktor Stribiżew, Nov 30 '16 at 10:31
@Wiktor I guess it'd be worth looking into it in a little more depth before drawing that conclusion but it certainly looks like it! — Tom Fenech, Nov 30 '16 at 10:33

score 4 · Accepted Answer · answered Nov 30 '16 at 09:47

4

Just tell grep to match only <word> + returns + <word>:

$ grep -Eo '\w+ returns \w+' file
put returns between
function returns void

answered Nov 30 '16 at 09:47

fedorqui

275,237
103
548
598

1

`'\w+ returns \w+'` won't return matches if a word before or after is missing (start/end of string). – Wiktor Stribiżew Nov 30 '16 at 09:47
1

... Or if punctuation interferes. – mouviciel Nov 30 '16 at 09:49
@WiktorStribiżew yep, I just focused on the basic example for a beginning. Probably `(\w+ )?` on both sides would make it more robust. – fedorqui Nov 30 '16 at 09:49
2

@fedorqui That would then match if another word contained the string `returns` – 123 Nov 30 '16 at 09:55
@123 Sorry, didn't realise you where talking about `(\w+ )?` – nu11p01n73R Nov 30 '16 at 10:00
1

... Or there are more than one blank between words. – Jdamian Nov 30 '16 at 11:04

score 2 · Answer 2 · answered Nov 30 '16 at 09:54

This should work

grep -oP '\w*\s*\breturns\b\s*\w*' file

Input

... put returns between paragraphs ...
... the function returns void ...
returns void 123r
123 4 void returns
123returns

Output

put returns between
function returns void
returns void
void returns

Will match if no word before or after returns

Ruslan Osmanov · Answer 3 · 2016-11-30T10:18:37.783

0

To match returns with optional word at the left/right:

grep -o -P '(?:\p{L}+\s+)?returns(?:\s+\p{L}+)?' file

where \p{L} is a Unicode category for a "letter".

To match returns with required words at the left and right:

grep -o -P '(?:\p{L}+\s+)returns(?:\s+\p{L}+)' file

edited Nov 30 '16 at 10:18

answered Nov 30 '16 at 10:13

Ruslan Osmanov

20,486
7
46
60

Grep words before and after match?

3 Answers3

Input

Output