0

I have 4 text file in which each file has a single column of data (~2000 lines in each file). What I am trying to do, is to compare all of the files, and determine what is the overlap between the different files. So, I would want to know what is in file1 but not the other 3 files, and what is in file2 but not in the other 3, what is in file1 and file2 only, etc. The ultimate goal is to make a venn diagram with 4 overlapping circles showing the various overlaps between the files.

I have been raking my brain trying to figure out how to do this. I have been playing with the comm and diff commands but am having trouble doing this with all of the files. Would anyone have any suggestions on how to do this?

Thanks for any help or suggestions.

user1352084
  • 459
  • 3
  • 6
  • 13

1 Answers1

0

Assuming 4 files named a b c d

lines existing in file a but not in any of the others (I assume ^ is a char not used in any of the files):

for l in `cat a | sort | uniq`;do echo $l^`grep -c $l b c d`;done | grep 'b:0 c:0 d:0$' | cut -d\^ -f1

lines existing in all of them:

for l in `cat a | sort | uniq`;do echo $l^`grep -c $l b c d`;done | grep 'b:[1-9]* c:[1-9]* d:[1-9]*$' | cut -d\^ -f1

...