5

Given a directory, how do I find all files within it (and any sub-directories) that are not hard-linked files? Or more specifically, that are not hard-linked files with more than one reference?

Basically I want to scan a folder and return a list of unique files within that directory, including directories and symbolic links (not their targets). If possible, it'd be nice to also ignore hard-linked directories on file-systems that support them (such as HFS+).

Haravikk
  • 3,109
  • 1
  • 33
  • 46

4 Answers4

25

find has an option that should be useful:

find . -type f -links 1 -print

Files that are hard linked by definition have a link count of 2 or greater, so this will show all files that have no other links to them.

twalberg
  • 59,951
  • 11
  • 89
  • 84
  • 2
    `-type f` excludes symlinks, directories, and several other things. The OP specifically wants to include directories and symlinks. Directories have at least two hard links (one from the parent, plus the directory's own `.` link), plus one for each subdirectory's `..`. – Keith Thompson Apr 29 '13 at 19:48
  • The `-type f` can be left out, in that case. However, the first and second paragraph of the question seem to disagree on that requirement. Typically, directories and symlinks won't have hard links to them anyway (although there are a couple of file system types where that's less true than others). – twalberg Apr 29 '13 at 19:55
  • 1
    Directories always have hard links. They can have hard links other than the usual ones (from the parent directory, `.` from the directory itself, and `..` from child directories) on *some* systems; the OP specifically mentioned that. And you can have hard links to symlinks: `touch file; ln -s symlink file; ln hardlink symlink` will make `hardlink` a symbolic link to `file`, with the same inode number as `symlink`. – Keith Thompson Apr 29 '13 at 20:02
  • Right - I mean't "typically" in the sense that it's not a particularly common practice to hard link sym links (and any scenario I can come up with off the top of my head where that's actually useful is pretty contrived). And, yes, directories use the hard link count for a slightly different purpose - you can't actually usually do `ln dir1 dir2`, though... – twalberg Apr 29 '13 at 20:14
5

Hard-linked files have the same inode. You can use stat to print the inode and the filename, and use awk to print the file only for the first time that inode appears:

stat -c '%i %n' *csv | awk '!seen[$1]++' | cut -d ' ' -f 2-
Ryan Wheale
  • 26,022
  • 8
  • 76
  • 96
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Thanks for the response! This is the solution I'm going with with some differences; I'm getting the inode number for a file in one directory, and comparing with the corresponding file in a previous directory, if they match then I skip it, otherwise it isn't linked. This works for hard-linked directories as well (on supported file-systems). – Haravikk Apr 30 '13 at 21:40
1

As I'm sure you know, all files have at least one hard link (in the parent directory).

To answer the question in your first paragraph (finding files that don't have additional hardlinks), you'll need to distinguish between directories and everything else. Assuming you have GNU Coreutils, you can use:

stat '%h' filename

to determine the number of hard links for a given file name. Otherwise you can parse the output of ls -ld filename -- which should work, but ls output isn't really meant to be machine-readable.

For anything other than a directory, if the number of links is greater than 1, there's a hard link to it somewhere.

A directory, on the other hand, will always have the usual one link from its parent, plus one for its own . entry, plus one for the .. entry of each of its immediate subdirectories. So you'll have to determine how many links it would have in the absence of any additional hard links, and compare that to the number it actually has.

You can avoid doing this if you happen to know that you're on a system that forbids hard links to directories. (I'm not sure whether that restriction is typically imposed by the OS or by each filesystem.)

But that doesn't solve the problem in your second paragraph, creating a list of unique files within a directory. Knowing that the plain file foo has a link count greater than 1 doesn't tell you whether it's unique in the current directory; the other hard links could be in different directories (they merely have to be in the same filesystem).

To do that, you can do something like:

stat -c '%i %n' *

which prints the inode number and name for each file in the current directory. You can then filter out duplicate inode numbers to get unique entries. This is basically what glenn jackman's answer says. Of course * doesn't actually match everything in the current directory; it skips files whose names start with ., and it can cause problems if some files have special characters (like space) in their names. That may not matter to you, but if it does (assuming GNU find):

find . -maxdepth 1 -print0 | xargs -0 stat -c '%i %n'

(That will still cause problems if any file names contain newline characters, which is actually legal.)

Community
  • 1
  • 1
Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • Thanks for the detailed response! If it helps, what I'm actually trying to do is recurse a Time Machine backup directory on OS X, and want to filter out everything that is just a link to a previous backup. For files this should mean that anything with only 1 link is sufficient, but for directories I'm probably going to have to use something more tricky then. – Haravikk Apr 30 '13 at 11:13
  • 2
    find knows about inode numbers and can print them directly, without the need to pipe into xargs: find . -maxdepth 1 -printf "%i %p\n" – presto8 Nov 30 '13 at 16:33
1

So all what you want is whatever file/link/dir/block/pipe/... but with different inode ? Then it's easy, list them with inode, do a numeric sort and finally only print the one with different inode numbers ... and remind find has a lot of options to restrict the output if you want to filter

find /PATH_to_SEARCH -ls | sort -n | awk '!seen[$1]++'

JBat
  • 21
  • 2