Get a difference file by specific patterns in two text files

Question

I have 2 text files and I need to export "changes" to a new file. That means that the second file's rows are compared to the first file's rows and if a row isn't found there, then it will append it to the new (third) file.

Contents of the first file are:

ABC 123 q1w2sd
DEF 321 sdajkn
GHI 123 jsdnaj
JKL 456 jsd223

The second file contains:

ABC 123 XXXXXX
JKL 456 jsd223
DEF XXX sdajkn
GHI 123 jsdnaj

Notice that lines which start with ABC and DEF have changed. JKL has just changed it's place.

The output file should contain: ABC 123 XXXXXX DEF XXX sdajkn

How to do this using 'awk' or 'sed'?

Edit: Also new lines in the second file should be counted as changes..

Dimitre Radoulov · Accepted Answer · 2013-06-25T16:30:59.750

4

awk 'NR == FNR { f1[$0]; next } !($0 in f1)' file1 file2

With grep: grep -Fvxf file1 file2

edited Jun 25 '13 at 16:30

answered Jun 25 '13 at 15:54

Dimitre Radoulov

27,252
4
40
48

Doesn't this only compare the first column? – Dropout Jun 25 '13 at 15:55
No, it compares the entire row. – Dimitre Radoulov Jun 25 '13 at 15:55
1

@Dropout In `awk` `$0` is entire line. `$1` is first column, `$2` is second column and so on... – jaypal singh Jun 25 '13 at 15:58
Wow this is really clear. Why would I check each column one by one when any changes in any rows export it to the third file. Great answer! :) – Dropout Jun 25 '13 at 15:59
@JS웃 thanks, that was exactly the thing I didn't understand until now +1 – Dropout Jun 25 '13 at 16:00

score 3 · Answer 2 · answered Jun 25 '13 at 15:27

3

Assuming 1st file is named: fileA and 2nd file is named: fileB you can use awk like this:

awk 'NR==FNR {a[$1];b[$0];next} ($1 in a) && !($0 in b)' file{A,B}

OR simply:

awk 'NR==FNR {a[$1];b[$0];next} ($1 in a) && !($0 in b)' file1 file2

answered Jun 25 '13 at 15:27

anubhava

761,203
64
569
643

are you storing each line's first column into "a" and "b"? I'm trying to understand that.. b[$0] means that the 0th(first) column is being stored to "b"? thanks – Dropout Jun 25 '13 at 15:44
I forgot to mention one thing. Also the new lines, which were added to the second file should be counted as changes. Basically everything new or updated in comparison with the first file. – Dropout Jun 25 '13 at 15:48
Yep, I figured it out probably.. I needed this: awk 'NR==FNR {a[$0];b[$1];c[$2];next} (($0 in a) && (!($1 in b) || !($2 in c))) || !($0 in a)' fileA fileB – Dropout Jun 25 '13 at 15:52
Many thanks for your help, you made me go the right way. The answer was very close to that. +1 – Dropout Jun 25 '13 at 15:53
@Dropout: You're welcome. Sorry I was away from my computer and couldn't reply to your questions immediately. Great to see your problem got resolved. – anubhava Jun 25 '13 at 16:17
Somehow I interpreted from `Notice that lines which start with ABC and DEF have changed` that you only want to print differences where 1st column is same in both file. Sorry bad interpretation. – anubhava Jun 25 '13 at 16:19

captcha · Answer 3 · 2013-06-25T21:35:08.127

2

Code for GNU sed:

$sed 's#\(.*\)#/\1/d#' file1|sed -f - file2
ABC 123 XXXXXX
DEF XXX sdajkn

This also treats "newlines" in file2.

edited Jun 25 '13 at 21:35

answered Jun 25 '13 at 21:29

captcha

3,756
12
21

score 0 · Answer 4 · answered Jun 30 '13 at 11:37

0

Using comm to find lines in 2nd file that are not in 1st:

$ comm -13 <(sort first) <(sort second)
ABC 123 XXXXXX
DEF XXX sdajkn

answered Jun 30 '13 at 11:37

glenn jackman

238,783
38
220
352

Get a difference file by specific patterns in two text files

4 Answers4