How to Return Everything After 2nd Occurance of String

Question

I was wondering what the best way to get everything after the second occurrence of a string would be. I have files like this:

---
title: Test Document
creation_date: 01-29-2016
---

Text, blah blah blah
More text, blah blah blah

So I have file which contains 'frontmatter' between two ---s. I would like to return everything after the second ---, preferably using a bash command of some sort. When thinking about this, sed and awk came to mind, but I don't really know which one would be better for the job.

An important part of this is that there can be any arbitrary amount of key-value pairs in the frontmatter, so just cutting the first four lines is not a valid approach here.

It is better to explain with an exact input you have an exact output you want. — Inian, Jan 29 '17 at 17:08
If you really plan on doing a lot of automated querying, a different file format is in order. This looks like it is designed for quick scanning by a human eye, not programmatic parsing. — chepner, Jan 29 '17 at 17:12
When trying to figure out if you should use sed or awk for any problem: sed is for simple substitutions on individual lines (**that is all**), awk is for everything else. The problem you describe is not a simple substitution on an individual line and therefor it's not a job for sed, its a job for awk. If you try to use sed for anything else you will quickly find yourself in a hell of indecipherable runes, portability issues, inefficiency and just about every other undesirable attribute of software. — Ed Morton, Jan 29 '17 at 18:08

score 3 · Answer 1 · answered Jan 29 '17 at 17:12

3

Using awk you can do this:

awk 'p>1; /---/{++p}' file

Text, blah blah blah
More text, blah blah blah

answered Jan 29 '17 at 17:12

anubhava

761,203
64
569
643

Casimir et Hippolyte · Answer 2 · 2017-01-30T10:15:55.690

2

With sed you can delete a range of lines between two patterns:

sed '/---/,/---/d' file

Other lines are displayed automatically.

More about sed features.

If you want to remove the lines above too, you can use this one:

sed '1{:a;N;/---.*---/d;ba}' file

details:

1  # if the current line is the first one
{
    :a  # define a label "a"
    N   # append the next line to the pattern space
    /---.*---/d  # delete the pattern space when the pattern succeeds
    ba  # go to label "a"
}

Note that the d command stops the script unconditionally and sed continues with the remaining lines.

edited Jan 30 '17 at 10:15

answered Jan 29 '17 at 17:01

Casimir et Hippolyte

88,009
5
94
125

1

This will print the lines above the first `---`; unclear if that is an issue for the OP... – dawg Jan 29 '17 at 19:18
@dawg: I have added an other version to do that. – Casimir et Hippolyte Jan 30 '17 at 10:08
sed -rn '1{ :X /---/{ H; g; /---\n---/d }; n; bX }; p' file – mug896 Jan 30 '17 at 10:56

dawg · Answer 3 · 2017-01-29T19:38:53.870

1

Here is a pure Bash solution:

while IFS= read -r line || [[ -n $line ]]; do 
    if [[ "$line" =~ ^--- ]]; then
        (( ++count ))
    elif [ $count -ge 2 ]; then
        echo "$line"
    fi
done <file

You can use awk in a sed like manner to print all outside of that pattern match range like so:

awk '/^---/,/^---/ {next} 1' file

edited Jan 29 '17 at 19:38

answered Jan 29 '17 at 19:17

dawg

98,345
23
131
206

How to Return Everything After 2nd Occurance of String

3 Answers3