-2
a
b
s
start
text
more text
end
even more text
end

I want to print the content between start and first end that follows the start (start is always unique). I also want to print between which lines the text had been printed, in this example between lines 4 and 7.

I was trying with grep and cat, but I couldn't do much.

I tried:

var=$(cat $path)
echo "$var" | grep -o -P '(?<=start).*(?=end)'

But it didn't print anything, without the grep, it prints the whole file.

Output should in this example should be:

The content is between lines 4 and 7.

start
text
more text
end
GeoCap
  • 505
  • 5
  • 15

2 Answers2

1

With shell variables passed to awk and then print text by range then try, mention your shell variable inside start variable of awk and we should be Good then. (Also change $0 ~ start to $0 ~ "^"start"$" in case you want to look for exact match for start value in lines.)

awk -v start="$your_shell_start_var" '
$0 ~ start,$0 ~ /^end$/{
  print
  if($0 ~ start){ startLine=FNR }
  if($0~/^end$/){ 
     print "The content is between lines " startLine " and " FNR
     exit
  }
}' Input_file

Sample output on OP's samples:

start
text
more text
end
The content is between lines 4 and 7

Simple explanation: Printing lines by range start till end in between this statements checking condition if line has end string then come out of the Input_file, we need NOT to read the complete Input_file since OP needs to print only very first set of lines.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • I think it reads to the end of the file, doesn't stop at "end". It starts at the correct position. – GeoCap Dec 12 '20 at 15:25
  • @GeoCap, if your line is exactly same as `end`(not having any spaces either end of line or starting of the line or any other words is being considered here) then it will print 1st set of matched value. Lets say start variable is `bla` then it should print the values from `bla` to `end`(first occurrence) and come out of the file then, please do let me know in case of any queries or where its not working. – RavinderSingh13 Dec 12 '20 at 15:27
  • For me the example with variable prints the the file starting from the variable and whole file (doesn't stop at "end"). The example without variable does stop at "end" with same file tested. – GeoCap Dec 12 '20 at 15:36
  • @GeoCap, for your given samples it worked fine for me, not sure if you hav whitespaces etc in your line which has string end in it, try changing from `$0 ~ /^end$/` to `$0 ~ /end/` once in 2 places in my OR code and let me know then? – RavinderSingh13 Dec 12 '20 at 15:38
  • 1
    Oh yes I had empty lines, sorry, now it works, thanks! – GeoCap Dec 12 '20 at 15:42
  • 1
    Yes I appreciate your help as half of my problem is solved, I would still want to print the numbers of lines between which the content was extracted. – GeoCap Dec 12 '20 at 15:52
  • @GeoCap, Your welcome. you could check my OR solution it will print line numbers too now, let me know if this helps you. – RavinderSingh13 Dec 12 '20 at 15:55
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225870/discussion-between-geocap-and-ravindersingh13). – GeoCap Dec 12 '20 at 16:05
  • I was only wondering how I could print the numbers of lines before the content. It's my fault because I gave the wrong output example in the question. – GeoCap Dec 12 '20 at 16:13
  • @GeoCap, IMHO, I would like to request you not to add any questions now else it will be confusing for users who are referring this thread in future. You could take reference from this code and could play around with it, if you have any issues please open a new thread/question, cheers and happy learning. – RavinderSingh13 Dec 12 '20 at 16:15
0

Sample data:

$ cat -n strings.dat
 1  a
 2  b
 3  s
 4  start
 5  text
 6  more text
 7  end of more text
 8  end
 9  even more text
10  end

One awk solution using a range (similar to RavinderSingh13's post) that prints out OP's textual message at the end:

startstring="start"                            # define start of search block

awk -v ss="${startstring}" '                   # pass start of search block in as awk variable "ss"

# search for a range of lines between "ss" and "end":

$0==ss,/^end$/ { if ($0==ss && x==0 ) x=FNR    # if this is the first line of the range make note of the line number
                 print                         # print the current line of the range
                 if ($0=="end")                # if this is the last line of the range then print our textual message re: start/finish line numbers
                    printf "\nThe content is between lines %d and %d.\n",x,FNR
               }
' strings.dat

NOTE: the $0==ss and /^end$/ tests assume no leading/trailing white space in the data file otherwise these tests will fail and there will be no range match.

With startstring="start" this generates:

start
text
more text
end of more text
end

The content is between lines 4 and 8.

With startstring="more text" this generates:

more text
end of more text
end

The content is between lines 6 and 8.

With startstring="even more text" this generates:

even more text
end

The content is between lines 9 and 10.

With startstring="water" this generates:

--no output--

NOTE: If OP uses startstring="end" the results are not as expected; while it would be possible to add more code to address this scenario I'm going to skip this scenario for the time being.

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • If you are going to use an exit, you may as well print the summary just before issuing the exit, stopping the need for an END block. – Raman Sailopal Dec 12 '20 at 15:00
  • sure, that was actually a good idea for the shortened version; thanks – markp-fuso Dec 12 '20 at 15:14
  • It didn't work for me. I should add that there could be whitespaces in the file and file is .cpp if that matters – GeoCap Dec 12 '20 at 15:47
  • you'll have to elaborate on `didn't work for me`; file name/extension doesn't matter, what matters is the content of the file for which we need an accurate description of said file contents (ie, update the question with more details) – markp-fuso Dec 12 '20 at 15:50