Finding files that are not hard links via a shell script

Question

Given a directory, how do I find all files within it (and any sub-directories) that are not hard-linked files? Or more specifically, that are not hard-linked files with more than one reference?

Basically I want to scan a folder and return a list of unique files within that directory, including directories and symbolic links (not their targets). If possible, it'd be nice to also ignore hard-linked directories on file-systems that support them (such as HFS+).

score 25 · Answer 1 · answered Apr 29 '13 at 19:31

25

find has an option that should be useful:

find . -type f -links 1 -print

Files that are hard linked by definition have a link count of 2 or greater, so this will show all files that have no other links to them.

answered Apr 29 '13 at 19:31

twalberg

59,951
11
89
84

2

`-type f` excludes symlinks, directories, and several other things. The OP specifically wants to include directories and symlinks. Directories have at least two hard links (one from the parent, plus the directory's own `.` link), plus one for each subdirectory's `..`. – Keith Thompson Apr 29 '13 at 19:48
The `-type f` can be left out, in that case. However, the first and second paragraph of the question seem to disagree on that requirement. Typically, directories and symlinks won't have hard links to them anyway (although there are a couple of file system types where that's less true than others). – twalberg Apr 29 '13 at 19:55
1

Directories always have hard links. They can have hard links other than the usual ones (from the parent directory, `.` from the directory itself, and `..` from child directories) on *some* systems; the OP specifically mentioned that. And you can have hard links to symlinks: `touch file; ln -s symlink file; ln hardlink symlink` will make `hardlink` a symbolic link to `file`, with the same inode number as `symlink`. – Keith Thompson Apr 29 '13 at 20:02
Right - I mean't "typically" in the sense that it's not a particularly common practice to hard link sym links (and any scenario I can come up with off the top of my head where that's actually useful is pretty contrived). And, yes, directories use the hard link count for a slightly different purpose - you can't actually usually do `ln dir1 dir2`, though... – twalberg Apr 29 '13 at 20:14

score 5 · Accepted Answer · edited Oct 04 '22 at 20:05

5

Hard-linked files have the same inode. You can use stat to print the inode and the filename, and use awk to print the file only for the first time that inode appears:

stat -c '%i %n' *csv | awk '!seen[$1]++' | cut -d ' ' -f 2-

edited Oct 04 '22 at 20:05

Ryan Wheale

26,022
8
76
96

answered Apr 29 '13 at 16:13

glenn jackman

238,783
38
220
352

Thanks for the response! This is the solution I'm going with with some differences; I'm getting the inode number for a file in one directory, and comparing with the corresponding file in a previous directory, if they match then I skip it, otherwise it isn't linked. This works for hard-linked directories as well (on supported file-systems). – Haravikk Apr 30 '13 at 21:40

score 1 · Answer 3 · edited May 23 '17 at 12:07

As I'm sure you know, all files have at least one hard link (in the parent directory).

To answer the question in your first paragraph (finding files that don't have additional hardlinks), you'll need to distinguish between directories and everything else. Assuming you have GNU Coreutils, you can use:

stat '%h' filename

to determine the number of hard links for a given file name. Otherwise you can parse the output of ls -ld filename -- which should work, but ls output isn't really meant to be machine-readable.

For anything other than a directory, if the number of links is greater than 1, there's a hard link to it somewhere.

A directory, on the other hand, will always have the usual one link from its parent, plus one for its own . entry, plus one for the .. entry of each of its immediate subdirectories. So you'll have to determine how many links it would have in the absence of any additional hard links, and compare that to the number it actually has.

You can avoid doing this if you happen to know that you're on a system that forbids hard links to directories. (I'm not sure whether that restriction is typically imposed by the OS or by each filesystem.)

But that doesn't solve the problem in your second paragraph, creating a list of unique files within a directory. Knowing that the plain file foo has a link count greater than 1 doesn't tell you whether it's unique in the current directory; the other hard links could be in different directories (they merely have to be in the same filesystem).

To do that, you can do something like:

stat -c '%i %n' *

which prints the inode number and name for each file in the current directory. You can then filter out duplicate inode numbers to get unique entries. This is basically what glenn jackman's answer says. Of course * doesn't actually match everything in the current directory; it skips files whose names start with ., and it can cause problems if some files have special characters (like space) in their names. That may not matter to you, but if it does (assuming GNU find):

find . -maxdepth 1 -print0 | xargs -0 stat -c '%i %n'

(That will still cause problems if any file names contain newline characters, which is actually legal.)

Thanks for the detailed response! If it helps, what I'm actually trying to do is recurse a Time Machine backup directory on OS X, and want to filter out everything that is just a link to a previous backup. For files this should mean that anything with only 1 link is sufficient, but for directories I'm probably going to have to use something more tricky then. — Haravikk, Apr 30 '13 at 11:13
find knows about inode numbers and can print them directly, without the need to pipe into xargs: find . -maxdepth 1 -printf "%i %p\n" — presto8, Nov 30 '13 at 16:33

score 1 · Answer 4 · answered Jan 10 '16 at 18:54

So all what you want is whatever file/link/dir/block/pipe/... but with different inode ? Then it's easy, list them with inode, do a numeric sort and finally only print the one with different inode numbers ... and remind find has a lot of options to restrict the output if you want to filter

find /PATH_to_SEARCH -ls | sort -n | awk '!seen[$1]++'

Finding files that are not hard links via a shell script

4 Answers4

Linked

Finding files that are *not* hard links via a shell script

4 Answers4

Linked

Finding files that are not hard links via a shell script