Parsing a diff file using grep/awk

Question

I'm trying to parse a standard diff of some sql files to return only the delete sections. I have been using grep with the after context (-A) which almost works (only because I know that delete sections will all be very short). e.g.

diff $$_$1.sql $$_$2.sql|egrep -A3 "[01234567889][01234567889]d[01234567889][0123456789]"

I am thinking that with AWK, I could tell it start at (the above regex) and stop at the first line starting with a digit or the first line ending with a --

I have played around a bit, but can't seem to find the right syntax to do this. Can this be done with AWK? or is there another tool I should use?

Preferably an example of the `diff` output (or at least tell us what KIND of diff it is -- edit script, context diff, unified diff, etc.) — voretaq7, Aug 11 '11 at 16:31
In addition to @voretaq7's questions, it'd also be worth knowing if you need the result to be a valid patch file afterwards. — womble, Aug 11 '11 at 18:08

quanta · Accepted Answer · 2011-08-12T13:55:05.383

I am thinking that with AWK, I could tell it start at (the above regex) and stop at the first line starting with a digit or the first line ending with a --

Please give us an example if it is not what you want:

sed -n '/[0-9][0-9]d[0-9][0-9]/,/^[0-9]\|--$/p'

EDIT

Although you've accepted my answer but I still want to edit my post to share with you a regex that can help you solve your problem thoroughly. sed allows you excluding the matching lines with b - branch command:

sed -n '/[0-9][0-9]d[0-9][0-9]/,/^[0-9]\|--$/ { /^[0-9]/b; p }'

but with this regex, sed also remove the REGEX1. So, Lookahead appears in my mind:

sed -n '/[0-9][0-9]d[0-9][0-9]/,/^[0-9]\|--$/ { /^[0-9](?:(?![0-9]d[0-9][0-9]).*)$/b; p }'

but it not works because the sed, awk, grep uses the POSIX RE flavor which doesn't support negative lookahead. You should try with Python, Perl, Ruby, ...

This is virtually perfect; I just missed a requirement. I need it to be exclusive of the ending line but inclusive of the starting line. — Robert, Aug 11 '11 at 18:17

score 0 · Answer 2 · answered Aug 11 '11 at 18:11

I'd be inclined to try to do this with unified diff and a simple grep:

diff -u a.sql b.sql | grep -v '^\+' | rediff

The rediff is going to try and fix up the offsets after you've mangled the diffs... it won't work in all circumstances, but it's the best hope you've got of keeping a valid diff.

score 0 · Answer 3 · answered Aug 11 '11 at 18:48

0

diff ... | awk '/start-mark/ {flag = 1} /end-mark/ {flag = 0} flag'

Your regex could probably be simplified to be [0-9] (etc.)

The flag = 0 could be changed to exit if you only want to print the first matching range of lines.

answered Aug 11 '11 at 18:48

Dennis Williamson

62,149
16
116
151

Parsing a diff file using grep/awk

3 Answers3