0

I am trying to match number at the end of the line ($), print relevant paragraphs and ignore third paragraph. Here is data:

this is first paragraph
number 200
with some text

this is second paragraph
with some text
number 200

this is third paragraph
with some text
number 2001

This command matches only first paragraph: awk -v RS="" -v ORS="\n\n" "/number 200\n/" file

This command matches only second paragraph: awk -v RS="" -v ORS="\n\n" "/number 200$/" file

Seems the problem is that awk understands character "$" as end of record instead of line. Is there some elegant way how to overcome this? Unfortunately I do not have grep that can work with paragraphs.

UPDATE:

Expected output:

this is first paragraph
number 200
with some text

this is second paragraph
with some text
number 200
  • 1
    Please add your desired output for that sample input to your question (no comment). – Cyrus Dec 09 '22 at 02:00
  • 4
    Yes, `$` is end of record (and `^` is beginning of record). I don't know if you count it as elegant, but you can match `/number 200(\n|$)/` – dave_thompson_085 Dec 09 '22 at 02:13
  • @Cyrus I've added the sample. – user3602441 Dec 09 '22 at 14:51
  • @dave_thompson_085 seems my brain did not work properly during night because I haven't realized that it is actually normal regex. Now I read that AWK does support extended regex. Thank you for pointing me the direction! – user3602441 Dec 09 '22 at 14:54

2 Answers2

1

You should include the desired output in your question.

However, if I understand you may want record 1 & 2 but not 3:

awk -v RS='' -v ORS='\n\n' '{s=$0; gsub("\n", " ", s); if (s ~ /number 200( |$)/) print}' file
Diego Torres Milano
  • 65,697
  • 9
  • 111
  • 134
  • Yes 1 & 2 but not 3. Can you please explain how come that the blank space matches "end of record"? Is new line character also considered as a white space? – user3602441 Dec 09 '22 at 15:08
1

Using any awk:

$ awk -v RS= -v ORS='\n\n' '/(^|\n)number 200(\n|$)/' file
this is first paragraph
number 200
with some text

this is second paragraph
with some text
number 200

Regarding Seems the problem is that awk understands character "$" as end of record instead of line - that's not a problem, that's the definition of $. In a regexp $ means end of string, it only appears to mean end of line if the string you're matching against just happens to be a single line, e.g. as read by grep, sed, and awk by default. When you're matching against a string containing multiple lines (e.g. using -z in GNU grep or GNU sed or RS="" in awk or RS='^$' in GNU awk) then you should expect $ to match just once at the and of that string (and ^ just once at the start of it), there's nothing special about newlines versus any other character in the string and no regexp metachar to match them.

Regarding Unfortunately I do not have grep that can work with paragraphs - no-one does as, unlike awk, grep doesn't have a paragraph mode.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185