0

I was wondering what the best way to get everything after the second occurrence of a string would be. I have files like this:

---
title: Test Document
creation_date: 01-29-2016
---

Text, blah blah blah
More text, blah blah blah

So I have file which contains 'frontmatter' between two ---s. I would like to return everything after the second ---, preferably using a bash command of some sort. When thinking about this, sed and awk came to mind, but I don't really know which one would be better for the job.

An important part of this is that there can be any arbitrary amount of key-value pairs in the frontmatter, so just cutting the first four lines is not a valid approach here.

tedm1106
  • 127
  • 1
  • 10
  • It is better to explain with an exact input you have an exact output you want. – Inian Jan 29 '17 at 17:08
  • If you really plan on doing a lot of automated querying, a different file format is in order. This looks like it is designed for quick scanning by a human eye, not programmatic parsing. – chepner Jan 29 '17 at 17:12
  • 3
    When trying to figure out if you should use sed or awk for any problem: sed is for simple substitutions on individual lines (**that is all**), awk is for everything else. The problem you describe is not a simple substitution on an individual line and therefor it's not a job for sed, its a job for awk. If you try to use sed for anything else you will quickly find yourself in a hell of indecipherable runes, portability issues, inefficiency and just about every other undesirable attribute of software. – Ed Morton Jan 29 '17 at 18:08

3 Answers3

3

Using awk you can do this:

awk 'p>1; /---/{++p}' file

Text, blah blah blah
More text, blah blah blah
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

With sed you can delete a range of lines between two patterns:

sed '/---/,/---/d' file

Other lines are displayed automatically.

More about sed features.

If you want to remove the lines above too, you can use this one:

sed '1{:a;N;/---.*---/d;ba}' file

details:

1  # if the current line is the first one
{
    :a  # define a label "a"
    N   # append the next line to the pattern space
    /---.*---/d  # delete the pattern space when the pattern succeeds
    ba  # go to label "a"
}

Note that the d command stops the script unconditionally and sed continues with the remaining lines.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
1

Here is a pure Bash solution:

while IFS= read -r line || [[ -n $line ]]; do 
    if [[ "$line" =~ ^--- ]]; then
        (( ++count ))
    elif [ $count -ge 2 ]; then
        echo "$line"
    fi
done <file

You can use awk in a sed like manner to print all outside of that pattern match range like so:

awk '/^---/,/^---/ {next} 1' file
dawg
  • 98,345
  • 23
  • 131
  • 206