I'm trying to compare 2 files on a Solaris box and only see the lines that are not similar. I know that I can use the command given below to find lines that are not exact matches, but that isn't good enough for what I'm try to do.
comm -12 <(sort FILE1.txt | uniq) <(sort FILE2.txt | uniq) > diff.txt
For the purposes of this question I would define simlar as having the same characters ~80% of the time, but completely ignoring locations that differ (since the sections that differ may also differ in length). The locations that differ can be assumed to occur at roughly the same point in the line. In other words once we find a location that differs we have to figure out when to start comparing again.
I know this is a hard problem to solve and will appreciate any help/ideas.
EDIT:
Example input 1:
Abend for SP EAOJH with account s03284fjw and client pewaj39023eipofja,level.error
Exception Invalid account type requested: 134029830198,level.fatal
Only in file 1
Example input 2:
Exception Invalid account type requested: 1307230,level.fatal
Abend for SP EREOIWS with account 32192038409aoewj and client eowaji30948209,level.error
Example output:
Only in file 1
I am also realizing that it would be ideal if the files were not read into memory all at once since they can be nearly 100 gigs. Perhaps perl would be better than bash because of this need.