There are a couple of differences between comm
and join
:
comm
compares whole lines; join
compares fields within lines.
comm
prints whole lines; join
can print selected parts of lines.
When you have a single column of data in each file, there is relatively little difference. When you have multiple columns, there can be a lot of difference.
Also note that under the right circumstances, join
can output multiple copies of the data from one file while joining with different lines from the other file. This looks to me like your problem; you probably have some duplicate values in one of the files. Suppose you have:
src txt
123 123
123
123
If you do comm -12 src txt
, you will get one line of output; if you do join src txt
, you will get three lines of output. This is expected.
The join
command can also handle 'outer joins' where data is missing from the second file for a line in the first file (a LEFT OUTER JOIN in terms of SQL) or vice versa (a RIGHT OUTER JOIN), or both at once (a FULL OUTER JOIN).
All-in-all, join
is a more complex command, but it is attempting to do a more complex job. Both are useful; but they are useful in different places.