4

Is there a simple solution to also extract one word before and one word after along with the matched word? For example, assuming the following text

... put returns between paragraphs ...
... the function returns void ...

the search for returns should return

put returns between
function returns void

I am not a bash expert, but could put together the following

grep -o -P "(?:[a-zA-Z'-]+[^a-zA-Z'-]+){1}returns(?:[^a-zA-Z'-]+[a-zA-Z'-]+){1}" TEXT.FILE

but not sure if this catches all.

fedorqui
  • 275,237
  • 103
  • 548
  • 598
user3639557
  • 4,791
  • 6
  • 30
  • 55
  • You can use `grep -oP '(\w+\W+)?returns(\W+\w+)?'` – Wiktor Stribiżew Nov 30 '16 at 09:49
  • Is there a case when the text "returns" does not have a word before or after? – fedorqui Nov 30 '16 at 09:50
  • @WiktorStribiżew note there is no need to use `-P` for this: `-E` suffices. – fedorqui Nov 30 '16 at 09:50
  • @fedorqui: does it have any impact on performance? – Wiktor Stribiżew Nov 30 '16 at 09:52
  • @WiktorStribiżew yep, `-P` is for `P`erl regexps, which are more expensive than the `E`xtended ones. Of course, for a tiny file this shouldn't matter at all but it is good practice to "just" use the needed extension. Also, `-E` is specified by POSIX, while `-P` _is highly experimental and grep -P may warn of unimplemented features_ (from `man grep`). – fedorqui Nov 30 '16 at 09:53
  • @fedorqui: I do not think there should be any difference when using such basic patterns. Is there any link to `grep` performance tests with identical, simple PCRE and extended patterns? I could not find any :( – Wiktor Stribiżew Nov 30 '16 at 09:59
  • @WiktorStribiżew Well, as always it is a matter of using resources as they are needed: we can say `cat file | grep '1'` or `grep '1' file`. They are the same and the performance won't differ much, but it is good to know that `grep` alone can read the file. In this case, opening the Perl Regexp engine sounds like too much when the Extended can already handle it. Also, note the POSIX part I mentioned later in my comment, which is also relevant. – fedorqui Nov 30 '16 at 10:08
  • 2
    @Wiktor I too had my doubts that it should be any slower to use `-P` so I just ran a pretty unscientific test `time grep -E '\w+' <(seq 1000000) > /dev/null` and it took ~0.55s, whereas the same using `-P` took ~0.85s. – Tom Fenech Nov 30 '16 at 10:25
  • @TomFenech: Thank you, so that means using the right regex flavor *is* of importance with `grep`. – Wiktor Stribiżew Nov 30 '16 at 10:31
  • @Wiktor I guess it'd be worth looking into it in a little more depth before drawing that conclusion but it certainly looks like it! – Tom Fenech Nov 30 '16 at 10:33

3 Answers3

4

Just tell grep to match only <word> + returns + <word>:

$ grep -Eo '\w+ returns \w+' file
put returns between
function returns void
fedorqui
  • 275,237
  • 103
  • 548
  • 598
2

This should work

grep -oP '\w*\s*\breturns\b\s*\w*' file

Input

... put returns between paragraphs ...
... the function returns void ...
returns void 123r
123 4 void returns
123returns

Output

put returns between
function returns void
returns void
void returns

Will match if no word before or after returns

123
  • 10,778
  • 2
  • 22
  • 45
0

To match returns with optional word at the left/right:

grep -o -P '(?:\p{L}+\s+)?returns(?:\s+\p{L}+)?' file

where \p{L} is a Unicode category for a "letter".

To match returns with required words at the left and right:

grep -o -P '(?:\p{L}+\s+)returns(?:\s+\p{L}+)' file
Ruslan Osmanov
  • 20,486
  • 7
  • 46
  • 60