How to get files in directory A but not B and vice versa using bash comm?

Question

I'm trying to use comm to get files on a folder A that is not on B and vice-versa:

comm -3 <(find /Users/rob/A -type f -exec basename {} ';' | sort) <(find "/Users/rob/B" -type f -exec basename {} ';' | sort)

I'm using basename {} ';' to exclude the directory path, but this is the output I get:

    IMG_5591.JPG
IMG_5591.jpeg
    IMG_5592.JPG
IMG_5592.jpeg
    IMG_5593.JPG
IMG_5593.jpeg
    IMG_5594.JPG
IMG_5594.jpeg

There's a tab in the name of the first directory, therefore all entries are considered different. What am I doing wrong?

see my answer for an explanation of `comm` generating the leading tabs; keep in mind that removing the tabs will not change the overall output because the file names really are different ... one set of file names end in `.JPG` while the other set of file names end in `.jpeg`; net result: `comm` is working properly ... OP just needs to add some code to remove the leading tabs — markp-fuso, Nov 06 '21 at 19:53

markp-fuso · Accepted Answer · 2021-11-06T19:57:26.577

The leading tabs are not being generated by the find|basename code; the leading tabs are being generated by comm ...

comm generates 1 to 3 columns of output depending on the input flags; 2nd column of output will have a leading tab while 3rd column of output will have 2 leading tabs.

In this case OP's code says to ignore column #3 (-3, the files in common between the 2 sources), so comm generates 2 columns of output w/ the 2nd column having a leading tab.

One easy fix:

comm --output-delimiter="" <(find...|sort...) <(find...|sort...)

If for some reason your comm does not support the --output-delimiter flag:

comm <(find...|sort...) <(find...|sort...) | tr -d '\t'

This assumes the file names do not include embedded tabs otherwise replace the tr with your favorite code to strip leading white space, eg:

comm <(find...|sort...) <(find...|sort...) | sed 's/^[[:space:]]*//'

Demo ...

$ cat file1
a.txt
b.txt

$ cat file2
b.txt
c.txt

$ comm file1 file2
a.txt
                b.txt
        c.txt

# 2x tabs (\t) before 'b.txt' (3rd column), 1x tab (\t) before 'c.txt' (2nd column):

$ comm file1 file2 | od -c
0000000   a   .   t   x   t  \n  \t  \t   b   .   t   x   t  \n  \t   c
0000020   .   t   x   t  \n

# OP's scenario:

$ comm -3 file1 file2
a.txt
        c.txt

# 1x tab (\t) before 'c.txt' (2nd column):

$ comm -3 file1 file2 | od -c
0000000   a   .   t   x   t  \n  \t   c   .   t   x   t  \n

Removing the leading tabs:

$ comm --output-delimiter="" -3 file1 file2
a.txt
c.txt

$ comm -3 file1 file2 | tr -d '\t'
a.txt
c.txt

$ comm -3 file1 file2 | sed 's/^[[:space:]]*//'
a.txt
c.txt

Thank you. I got confused with the columns, I had though it was part of the find response. I was hoping to get an empty response - I didn't notice the extension was different ;( — Roberto, Nov 06 '21 at 19:54
By the way, any tips on how to exclude the extensions? I tried the stuff from https://stackoverflow.com/questions/41965415/using-find-to-return-filenames-without-extension/41966340 without success (I get no output). Also, I can't use printf. — Roberto, Nov 06 '21 at 21:07
@Roberto you get no output from what? no output from the `find` calls? or no output from the `comm` call? based solely on the sample data provided I would expect the `comm` call (sans extensions) to generate no output ... right? — markp-fuso, Nov 06 '21 at 21:18
modifying the `find` call to strip off extensions is venturing away from the original purpose of this Q&A; if you're still having problems getting your new `find` code to work I'd suggest creating a new question and focus on the `find` + `remove extensions` issue — markp-fuso, Nov 07 '21 at 14:25

score 0 · Answer 2 · answered Nov 06 '21 at 19:46

0

If basename causes issues, you can use find's printf :

#!/bin/bash
    
find_basename(){
    find "$1" -type f -printf "%P\n" | sort
}

comm -3 <(find_basename /Users/rob/A) <(find_basename /Users/rob/B)

answered Nov 06 '21 at 19:46

Philippe

20,025
2
23
32

How to get files in directory A but not B and vice versa using bash comm?

2 Answers2