1

I have 2 csv files in 2 different directories,i am running a diff on them like this :

diff -b -r -w <dir-one>/AFB.csv <dir-two>/AFB.csv

I am getting the output as expected:

14c14
< image_collapse,,collapse,,,,,batchcriteria^M
---
> image_collapse1,,collapse1,,,,,batchcriteria^M
16a17
> image_refresh,,refresh,,,,,batchcriteria^M

My requirement is that the lines which have changed should goto changed.log file,lines that have been appended should goto append.log.

The output clearly shows that "c" in 14c14 means that line has changed, and "a" in 16a17 means line has been appended. But how do i log them in different log files.

6055
  • 439
  • 1
  • 4
  • 14
  • This is not as trivial as it looks: You first need to find a reliable way to separate the hunks, then you need to parse each hunk to decide in which file to put it. If I were you, I'd use a real programming language like Python to do it, since it will be a nightmare doing it in bash only. – Michael Schlottke-Lakemper Feb 26 '14 at 05:59
  • @MichaelSchlottke : Can't use any prog language,have to do it in bash somehow. :( – 6055 Feb 26 '14 at 06:05

2 Answers2

1

Edit: Same as original answer below but avoiding options not supported by diff on HP-UX. Use something like:

diff -b -r -w /tmp/one.txt /tmp/two.txt \
| sed -n -e '/c/ {s/[^c]*c\(.*\)/\1 p/;p}' \
| sed -n -f - /tmp/two.txt > /tmp/changed.txt

diff -b -r -w /tmp/one.txt /tmp/two.txt \
| sed -n -e '/a/ {s/[^a]*a\(.*\)/\1 p/;p}' \
| sed -n -f - /tmp/two.txt > /tmp/new.txt

This converts the line numbers output from diff to sed print (p) commands for added (a) and changed (c) line ranges. The resulting sed scripts are applied to the second file to print just the desired lines. (I hope HP-UX sed supports the -f - for taking script from standard input.)


There seems to be a solution which does not require interpreting line numbers from the output of diff. diff supports --side-by-side formatting (-y) which includes a gutter marking old, new, and changed lines with <, >, and | respectively. You can reduce this side-by-side format to just the markers by using --width=1 (or -W1). If you take the changed and new markers (grep -v) and prefix the lines of the second file with it (paste) then you can filter (grep) by prefixed markers and throw away (cut) the markers. You can do this for both new and changed files.

Here is a self-contained "script" as an example:

# create two example files (one character per line)
echo abcdefghijklmnopqrstuvwxyz | grep -o . > /tmp/one.txt
echo abcDeFghiJKlmnopPqrsStuvVVwxyzZZZ | grep -o . > /tmp/two.txt

# diff side-by-side to get markers and apply to new file

diff -b -r -w -y -W1  /tmp/one.txt /tmp/two.txt \
| fgrep -v '<' | paste - /tmp/two.txt \
| grep -e '^|' | cut -c3- > /tmp/changed.txt

diff -b -r -w -y -W1  /tmp/one.txt /tmp/two.txt \
| fgrep -v '<' | paste - /tmp/two.txt \
| grep -e '^>' | cut -c3- > /tmp/new.txt

# dump result
cat /tmp/changed.txt
echo ---
cat /tmp/new.txt

Its output is

D
F
J
K
---
P
S
V
V
Z
Z
Z

I hope this helps you solve your problem.

halfbit
  • 3,414
  • 1
  • 20
  • 26
  • Thanks a lot for the help but I have to run the script in unix(HP-UX). In unix, -y and -W are not recognized.Kindly suggest something in unix. – 6055 Feb 27 '14 at 09:34
  • Is using [GNU diffutils for HP-UX](http://hpux.connect.org.uk/hppd/hpux/Gnu/) an option for you? – halfbit Feb 27 '14 at 21:30
  • No,can't use GNU diffutils – 6055 Mar 04 '14 at 09:12
1

This can be done through a "grep" command like follows.

diff -b -r -w <dir-one>/AFB.csv <dir-two>/AFB.csv | grep ">" >> append.log
diff -b -r -w <dir-one>/AFB.csv <dir-two>/AFB.csv | grep "<" >> changed.log
sugunan
  • 4,408
  • 6
  • 41
  • 66