Pipelining to get certain data

Question

I have a command that gives the following output:

#sec one
a : same
b : red
c : one
d :
e :
f :

#sec two
a : same
b : blue
c : two
d :
e :

#sec three
a : different
b : green
c : three
d :
e :

#sec four
a : different
b : yellow
c : four

#sec five
a : different
b : pink
c : five

There are a lot of such sections. I need only the sections that have a : same and the value of b and c fields for those sections.

Sample output:

#sec one
a : same
b : red
c : one


#sec two
a : same
b : blue
c : two

This is what I've done so far! Tr -s to make it equally spaced.

mycommand | tr -s " " | cut -d ':' -f 2

Does anyone know another way of doing it or using conditionals in cut statements?

I would suggest doing this sort of structured parsing in a language other than Bash. It's probably doable, but it's going to be a pain. Try Python, if you've never used it before it'll be a fun exercise. — dimo414, Jun 26 '17 at 23:14
Yes, it's absolutely trivial with awk. Do you want the b and c sections printed because they start with the letters b and c or because they are non-empty on the right of the `:`? — Ed Morton, Jun 27 '17 at 05:03

Paulo Mattos · Answer 1 · 2017-06-27T00:13:31.357

Maybe awk can help you here ;) Try this:

mycommand | tr -d " " | awk -F: '/a:/ {a=$2;} /(b:|c:)/ {if (a == "same") print $2;}'

output:

red
one
blue
two

If you need the field names as well, just replace $2 with $0 in the last print:

mycommand | tr -d " " | awk -F: '/a:/ {a=$2;} /(b:|c:)/ {if (a == "same") print $0;}'

output:

b:red
c:one
b:blue
c:two

By the way, tested on macOS 10.12.4 running awk version 20070501.

score 0 · Answer 2 · answered Jun 26 '17 at 23:59

awk to the rescue!

$ awk -v RS= -F'\n' '/a : same/{print $1; 
                                for(i=2;i<=NF;i++) if($i~/^(a|b|c)/) print $i; 
                                print ""}' file    

#sec one                                                                                                                                  
a : same                                                                                                                                  
b : red                                                                                                                                   
c : one                                                                                                                                   

#sec two                                                                                                                                  
a : same                                                                                                                                  
b : blue                                                                                                                                  
c : two

score 0 · Answer 3 · answered Jun 27 '17 at 05:22

I find when you have name->value pairs in your input it's best to first create an array that represents that mapping and then you can get at field values by just using their names, e.g.:

$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS=OFS="\n" }
{
    delete n2v
    for (i=2;i<=NF;i++) {
        name = value = $i
        sub(/[[:space:]]*:.*/,"",name)
        sub(/^[^:]+:[[:space:]]*/,"",value)
        n2v[name] = value
    }
}
n2v["a"] == "same" { print $1, p("a"), p("b"), p("c") }
function p(n) { return (n " : " n2v[n]) }

$ awk -f tst.awk file
#sec one
a : same
b : red
c : one

#sec two
a : same
b : blue
c : two

That way you can trivially and robustly modify your script to print whatever fields you want for whatever reasons you want in whatever order you want by just tweaking the last 2 lines of the script.

agc · Answer 4 · 2017-06-27T13:20:05.017

Two one-liners:

GNU grep method:

grep --group-separator= -B1 -A2 '^a : same$' input_file

Output:

#sec one
a : same
b : red
c : one

#sec two
a : same
b : blue
c : two

A little buffer juggling with sed:
```
sed -n '/^a : same$/{x;p;x;p;n;p;n;p;z;p};h' input_file
```
Output:
```
#sec one
a : same
b : red
c : one

#sec two
a : same
b : blue
c : two
```
How it works:
- /^a : same$/ finds the section to print, but it's never the first line, (there's always a preceding comment line), so the first code that's executed is h, which overwrites whatever's in the "hold" buffer with the current line.
- So the next cycle, the hold buffer always contains the previous line, and the pattern buffer contains the current line.
- When /^a : same$/ is true, the code in curly braces is run. It exchanges the pattern and hold buffers, prints what was in
  the hold buffer, (i.e. the comment line), exchanges them back, prints the pattern buffer, (i.e. the search string), twice it gets the next line and prints it, after that it zaps the pattern buffer, (deletes it), and prints that, (i.e. prints a blank line).

Pipelining to get certain data

4 Answers4