How can I return the entire contents of a split line based on a search?

Question

I posted previously about a small script that I'm working on. I eventually figured out that problem. Now I'm running into a different one. Hopefully you can help.

Some setup: I have a short list stored as a markdown file.

|One Hundred Years of Solitude|Gabriel García Márquez|-|-|-|-|1967|
|Moby-Dick|Herman Melville|-|-|-|-|1851|
|Frankenstein|Mary Shelley|-|-|-|-|1818|
|On the Road|Jack Kerouac|-|-|-|-|1957|
|The Turn of the Screw|Henry James|-|-|-|-|-|

I've figured out how to feed the file through cat, sed, xargs, and awk.

cat list.md | sed -e 's/^\|//' -e 's/\|$//' -e 's/^ *//' \
-e '/^\:/d' -e '/\'Title'/d' -e '/^\r/d' -e '/^$/d' | xargs -0 echo | \
awk -F '|' '{print "----"} {print "Title:", $1} {print "Author:", $2} \
{print "Date Begun:", $4} {print "Date Finished:", $5}'

That command returns this:

----
Title: One Hundred Years of Solitude
Author: Gabriel García Márquez
Date Begun: -
Date Finished: -
----
Title: Moby-Dick
Author: Herman Melville
Date Begun: -
Date Finished: -
----
Title: Frankenstein
Author: Mary Shelley
Date Begun: -
Date Finished: -
----
Title: On the Road
Author: Jack Kerouac
Date Begun: -
Date Finished: -
----
Title: The Turn of the Screw
Author: Henry James
Date Begun: -
Date Finished: -

What I'd like to do is incorporate this into a script that I can run with an argument like 'books Melville' that will run the above commands, pipe it into grep, search for the argument (preferably either a word or a string), and then return the entire line. As in, if I type 'books Melville', the script would return

----
Title: Moby-Dick
Author: Herman Melville
Date Begun: -
Date Finished: -

Currently, if I type 'books Melville', all that it returns is 'Author: Herman Melville'.

Sorry for the long post.

Edit with another apology: I forgot to mention that I'm on OSX.

That long command seems **hugely** overcomplicated for that transformation for starters but that being said why not filter **before** that transformation? That would seem to me to be simpler and require just grep. — Etan Reisner, Aug 14 '14 at 20:10
Also thie `-e '/\'Title'/d'` doesn't do what you probably meant. It isn't escaping that second `'` because you can't do that in the shell. That breaks down as three strings: single-quoted `/\ `, unquoted `Title` single quoted `/d`. — Etan Reisner, Aug 14 '14 at 20:12
@EtanReisner I fully acknowledge that that command could be a lot simpler, but I wanted to start teaching myself regexes. As far as the `-e '/\'Title'/d'` goes, I just omitted the first two lines of the markdown table, which contain the strings corresponding to that regex. — a--clam, Aug 14 '14 at 20:47
I'm not suggesting it didn't do what you intended. I'm suggesting that the way the string was put together on the command line likely wasn't what you thought it was. It wasn't an escaped single-quote inside a single-quoted string because single-quoted strings don't honor escapes like that. (Also the count would be off for that since you would have three unescaped single quotes that way. — Etan Reisner, Aug 14 '14 at 20:50

score 2 · Answer 1 · answered Aug 14 '14 at 20:14

I will give you two small awk scripts (will require GNU awk for second script due to multi-char RS. You can make it portable by just using space instead of ---- and use awk paragraph mode). The first script is to remove all that mess you have and create a database file.

$ cat md.file
|One Hundred Years of Solitude|Gabriel García Márquez|-|-|-|-|1967|
|Moby-Dick|Herman Melville|-|-|-|-|1851|
|Frankenstein|Mary Shelley|-|-|-|-|1818|
|On the Road|Jack Kerouac|-|-|-|-|1957|
|The Turn of the Screw|Henry James|-|-|-|-|-|

$ awk -F"[|]" '{
    printf "----\nTitle: %s\nAuthor: %s\nDate Begun: %s\nDate Finished: %s\n", $2, $3, $5, $6
  }' md.file > database.file

Now the database.file looks like this:

----
Title: One Hundred Years of Solitude
Author: Gabriel García Márquez
Date Begun: -
Date Finished: -
----
Title: Moby-Dick
Author: Herman Melville
Date Begun: -
Date Finished: -
----
Title: Frankenstein
Author: Mary Shelley
Date Begun: -
Date Finished: -
----
Title: On the Road
Author: Jack Kerouac
Date Begun: -
Date Finished: -
----
Title: The Turn of the Screw
Author: Henry James
Date Begun: -
Date Finished: -

Once the file is ready, you can use the following awk script probably in a bash script or from command line, which ever way you deem fit.

If you wish to run from bash file, you can create a bash variable that you want to search one.

$ look=Melville
$ echo "$look"
Melville
$ awk -v RS="----" -vlook="$look" '$0~look' database.file

Title: Moby-Dick
Author: Herman Melville
Date Begun: -
Date Finished: -

If you wish to by-pass the shell variable, you can just do a regex search.

awk -v RS="----" '/Melville/' database.file

awk will do the printing for you if your condition is true. Which means, the above statements is exactly like saying

awk -v RS="----" '/Melville/ { print $0 }' database.file

or

awk -v RS="----" -vlook="$look" '$0~look { print $0 }' database.file

At my first job, we implemented (among other things) a relational database using standard UNIX™ tools, awk amongst them, and ended up with a result much like what you have here. — Andrew Beals, Aug 14 '14 at 20:16

score 1 · Answer 2 · answered Aug 14 '14 at 20:54

With bash:

seek=he
labels=(- Title Author - "Date Begun" "Date Finished")
while IFS='|' read -ra fields; do
    [[ "${fields[*]}" == *"$seek"* ]] || continue
    printf "%s\n" "----"
    for i in 1 2 4 5; do
        printf "%s: %s\n" "${labels[i]}" "${fields[i]}"
    done
done < list.md

----
Title: Frankenstein
Author: Mary Shelley
Date Begun: -
Date Finished: -
----
Title: On the Road
Author: Jack Kerouac
Date Begun: -
Date Finished: -
----
Title: The Turn of the Screw
Author: Henry James
Date Begun: -
Date Finished: -

konsolebox · Accepted Answer · 2014-08-14T21:28:37.077

0

Using Awk:

#!/usr/bin/awk -f
BEGIN {
    if (!(ARGC >= 2)) exit
    search = ARGV[1]
    ARGV[1] = "/complete/path/to/list.md"
    FS = "|"
    OFS = "\n"
}
$0 ~ search {
    print "----", "Title: " $2, "Author: " $3, "Date Begun: " $4, "Date Finished: " $5
}

Change the value "/complete/path/to/list.md" with a real one. Save it on a directory covered by $PATH like /usr/local/bin and name it as books. Change its permission to 0755 then test it with books Melv.

If you're not running as root, to make it easier, save it first to a temporary file like script.awk, make some proper edits, then run:

sudo install -m 0755 script.awk /usr/local/bin/books

Multiple keywords

This version allows multiple keywords to validate the search:

#!/usr/bin/awk -f
BEGIN {
    if (!(ARGC >= 2)) exit
    for (i = 1; i < ARGC; ++i) {
        keywords[k++] = ARGV[i]
    }
    ARGV[1] = "/complete/path/to/list.md"
    ARGC = 2
    FS = "|"
    OFS = "\n"
}
$0 ~ keywords[0] {
    for (i = 1; i < k; ++i) {
        if (!($0 ~ keywords[i])) {
            next
        }
    }
    print "----", "Title: " $2, "Author: " $3, "Date Begun: " $4, "Date Finished: " $5
}

edited Aug 14 '14 at 21:28

answered Aug 14 '14 at 20:16

konsolebox

72,135
12
99
105

Should I save the file as books.sh? – a--clam Aug 14 '14 at 20:44
@a--clam It's an awk script but you don't have to add an extension to it. If you really want to run it as `books` and not `books.awk`, place it on `/usr/local/bin` as `books`. Please check instructions. – konsolebox Aug 14 '14 at 20:45
Thanks for the edited response. Sorry I didn't check that before replying. I edited my original post to say this as well, but seeing as I'm on OSX, I don't seem to have gawk. What can I do about that? – a--clam Aug 14 '14 at 20:55
@a--clam Other awks just don't have `IGNORECASE`. Is it ok if keywords can't be case insensitive? You can just change the header to `/usr/bin/awk`. – konsolebox Aug 14 '14 at 20:57
1

It's fine if words aren't case-sensitive. That fixed the problem. Thanks again! – a--clam Aug 14 '14 at 21:00
@a--clam Welcome. I hope you try the new version as well. – konsolebox Aug 14 '14 at 21:30

How can I return the entire contents of a split line based on a search?

3 Answers3

Multiple keywords