Daemon to create hardlinks on Linux file server which finds identical files

Question

I have a Linux server where I only store new files or rename directories and never edit files. It contains backups from other Linux servers.

Due to certain circumstances there are quite some duplicate files, often with different names.

Is there any free Linux tool which periodically scans the filesystem and has a database with filenames, sizes and maybe sha1sums and then identifies duplicates and replaces them with hardlinks?

I use rsync and I know that rsync can create hardlinks. But I still want to get the remaining duplicates. Also I learnt that it's a good idea to rotate logs not like syslog.123.gz, but instead let logrotate name them with the date, otherwise rsync copies them again and again... I use rsync because then I can easily do incemental, very efficient backup of all my Linus machines and I can recreate the system easily — Christian, Apr 23 '13 at 20:40

score 2 · Accepted Answer · edited Apr 13 '17 at 12:37

2

some tools taken from https://unix.stackexchange.com/questions/3037/is-there-an-easy-way-to-replace-duplicate-files-with-hardlinks

trimtrees.pl
fduples -L
findup -m (from fslint)
rdfind -makehardlinks

You could run one of them in a cron job.

edited Apr 13 '17 at 12:37

Community

1

answered Sep 19 '13 at 08:57

simohe

36
2

score 1 · Answer 2 · answered Apr 23 '13 at 19:42

You can use a deduplicating filesystem. There are two main choices in Linux - btrfs and zfs.

With btrfs the drawback would be that it is still not marked as stable and has no fsck.

ZFS is not in the Linux kernel due to licensing issues but there is a kernel module with support for most Linux distributions. Also ZFS sports some kind of online-fsck with the scrub feature. You can have a look at the supported distros on zfsonlinux.org

Both have compression, deduplication and snapshotting features without the need of any additional userspace daemons - making them ideal for backup solutions.

I'm using ext4 at the moment. Currently only 2 TB. I'd rather use something in userspace if possible, but thanks anyway! — Christian, Apr 23 '13 at 20:41

Daemon to create hardlinks on Linux file server which finds identical files

2 Answers2