0

I have a Linux server where I only store new files or rename directories and never edit files. It contains backups from other Linux servers.

Due to certain circumstances there are quite some duplicate files, often with different names.

Is there any free Linux tool which periodically scans the filesystem and has a database with filenames, sizes and maybe sha1sums and then identifies duplicates and replaces them with hardlinks?

Christian
  • 1,052
  • 5
  • 16
  • 24
  • What do you use for to make your backups? – Spack Apr 23 '13 at 19:33
  • I use rsync and I know that rsync can create hardlinks. But I still want to get the remaining duplicates. Also I learnt that it's a good idea to rotate logs not like syslog.123.gz, but instead let logrotate name them with the date, otherwise rsync copies them again and again... I use rsync because then I can easily do incemental, very efficient backup of all my Linus machines and I can recreate the system easily – Christian Apr 23 '13 at 20:40

2 Answers2

2

some tools taken from https://unix.stackexchange.com/questions/3037/is-there-an-easy-way-to-replace-duplicate-files-with-hardlinks

  • trimtrees.pl
  • fduples -L
  • findup -m (from fslint)
  • rdfind -makehardlinks

You could run one of them in a cron job.

simohe
  • 36
  • 2
1

You can use a deduplicating filesystem. There are two main choices in Linux - btrfs and zfs.

With btrfs the drawback would be that it is still not marked as stable and has no fsck.

ZFS is not in the Linux kernel due to licensing issues but there is a kernel module with support for most Linux distributions. Also ZFS sports some kind of online-fsck with the scrub feature. You can have a look at the supported distros on zfsonlinux.org

Both have compression, deduplication and snapshotting features without the need of any additional userspace daemons - making them ideal for backup solutions.

Izzy
  • 795
  • 2
  • 8
  • 31
  • I'm using ext4 at the moment. Currently only 2 TB. I'd rather use something in userspace if possible, but thanks anyway! – Christian Apr 23 '13 at 20:41