2

In bash, I can get the basename (name without path) of found files like this:

find . -exec basename {} \;

and I can get the file size like this:

find . -exec ls -l {} \; | awk '{print $5}'

but I need to get the basename and filesize separated by a space.

How do i combine those two commands correctly using one find operation? This code does not work:

find . -exec basename {} \; -exec ls -l {} | awk '{print $5}' \;

awk: can't open file ;find: 
 source line number 1
-exec: no terminating ";" or "+"

I am trying to create a fast duplicate file finder. Using this list, I would do a sort and then use uniq to find all files that are duplicates using the criteria: a duplicate = same "basename" & same "size" (without an md5 check).

So far, just making this initial list is where I am hung up syntactically (and maybe programmatically). Please let me know if you have a better method. It am trying to make it work using the most basic bash commands so it works on both linux and mac without installing anything.

Kevvvin
  • 23
  • 4

1 Answers1

5

GNU systems

For GNU systems, use this command

find . -printf '%k\t%f\n'

to get your output of size of each file and basename.

  • %k prints size of file in kb
  • \t literal tab character
  • %f prints filename with leading directory path removed
  • \n literal newline character

OSX

For OSX, use this command since it doesn't natively support the -printf argument directly.

find . -exec bash -c 'printf "%s\t%s\n" $(stat -f " %z" "$1") "$(basename "$1")"' - {} \;
BobTuckerman
  • 689
  • 9
  • 13
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thanks for your response, however OSX does not use GNU utilities, so it wont work for this cross platform script. The find command is restricted to what is available in the version of find that FreeBSD uses. More info here: http://stackoverflow.com/questions/752818/find-lacks-the-option-printf-now-what – Kevvvin Aug 03 '16 at 23:06
  • 1
    To answer your question about osx version, I have to deploy this as a utility to people who create drives of camera footage, who all work on various versions of OSX, so it must be fairly independent of that as well. The idea is that without installing anything else, they could run a small bash program in the root of a drive that would quickly parse all files on the drive and tell them if there are any filename-size dupes. If there are these kinds of dupes, they messed up creating the footage drives (every camera master should be unique). – Kevvvin Aug 04 '16 at 13:20