Unix - Want records from file 2 that are not in file 1 by matching on the first 91 characters

Question

I want to compare file2 to file1 by matching in the first 91 characters of each file and output the full record from file2 to file3. I'm new to Unix commands and just cant seem to figure this out.

Thanks in advance, Jeff

You should show us some code, that You have tried to solve the problem Yourself. The question in this form violate rules, point 4. http://stackoverflow.com/help/on-topic — Michas, Oct 28 '16 at 21:06
Sorry for the rule violation. The code I inherited was: comm file1 file2>file3 — jsouthworth, Oct 28 '16 at 22:09
1. Edit question. 2. Show code. 3. Add input data. 4. Show expected output. 5. Show received output. — Michas, Oct 28 '16 at 22:26
Please add sample input and your desired output for that sample input to your question. — Cyrus, Oct 29 '16 at 05:56

score 0 · Answer 1 · answered Oct 28 '16 at 20:14

You can compare two files using cmp:

$ cmp file1 file2
file1 file2 differ: byte 92, line 1

If you want to only compare the first 91 bytes you can use the -n switch:

$ cmp -n 91 file1 file2

If you want do something in that case (e.g,. copy the file to anther file), you can use bash's if:

if cmp -n 91 file1 file2; then
    cp file2 file3
fi

score 0 · Accepted Answer · answered Oct 29 '16 at 09:40

I generated dummy files as follows:

file1

A012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
B012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
C012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
D012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
E012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
F012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789

file2

Z012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 1
B012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 2
T012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 3
D012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 4
E012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 5
F012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 6

Then I think you want this:

awk '
   # Processing for file1, basically create associative array entry indexed by leftmost 91 characters
   FNR==NR { f1[substr($0,1,91)]++; next }

   # Processing for second file
   f1[substr($0,1,91)] > 0

   ' file1 file2

Sample Output

B012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 2
D012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 4
E012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 5
F012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 6

Actually, I now think you might want precisely the other lines, if so, change this:

f1[substr($0,1,91)] > 0

to this:

! f1[substr($0,1,91)]

Absolutely awesome! This did the trick. Thanks for the help! — jsouthworth, Oct 29 '16 at 14:06

Unix - Want records from file 2 that are not in file 1 by matching on the first 91 characters

2 Answers2