14

I've got a bunch of duplicate messages in my IMAP server's Maildir. What's the best way to remove them?

Some relevant points:

  • Shared Message-ID is usually a good enough definition of duplicate. A tiny script that removes all but one of the duplicate messages would work.
  • Sometimes it's necessary to find duplicates based on shared message bodies. What's a reasonable definition of shared here? Bitwise equivalent? What about weird differences in line wrapping, escaping, character encoding?
  • Sometimes there's some meaningful difference between 'duplicate' messages. What's the best way to review the differences in sets of 'duplicate' messages? Diffs?
voretaq7
  • 79,879
  • 17
  • 130
  • 214
Joseph Holsten
  • 273
  • 2
  • 9

5 Answers5

10

I've made some significant improvements to Kevin's script mentioned above, and he was kind enough to accept my pull requests. Eventually we split this off into a dedicated project which you can find here:

https://github.com/kdeldycke/maildir-deduplicate

Adam Spiers
  • 570
  • 1
  • 4
  • 13
3

for generic files in linux, I use fdupes utils to remove duplicate files. I found it also works for Maildir messages.

sarabande
  • 51
  • 4
  • 1
    [fdupes](https://github.com/adrianlopezroche/fdupes) seems to work for exact duplicates only, while the OP is (implicitly) asking about more complex patterns of duplication. A message delivered twice because of `.forward` or whatever will have slightly different headers, so while the message itself is a duplicate, the two files containing the two copies may not be. – tripleee Feb 06 '17 at 06:55
1

If you use Dovecot for IMAP access, you can use the following command:

doveadm deduplicate -u user@yourdomain.com mailbox INBOX

to remove duplicates from your INBOX folder.

As Dustwolf commented, if you want to match emails in all folders, type the following instead:

doveadm deduplicate -u user@yourdomain.com ALL

It should take care of everything, all duplicate emails should be deleted right away.

If you have very specific needs like filter on size or only for new messages, I suggest that you look into the Dovecot Search Query documentation as well as the doveadm deduplicate documentation and adapt the command to your needs.

1

Best I've found today is Kevin Deldycke's maildir-deduplicate.

  • It ignores the X-MIMETrack header by default and compares headers using the SHA224 digest.
  • It automatically deletes duplicates without asking for confirmation; however there is a dry-run mode which allows previewing which duplicates will be deleted.

I bet someone could make something fancy from Rick Sanders' delIMAPdups.pl, part of his IMAP Tools.

Adam Spiers
  • 570
  • 1
  • 4
  • 13
Joseph Holsten
  • 273
  • 2
  • 9
  • `maildir-deduplicate` [moved to a new location](http://kevin.deldycke.com/2013/06/maildir-deduplicate-moved/) so I updated the link. However your information is now out of date. – Adam Spiers Feb 03 '14 at 12:32
  • I've updated this so that there is no longer misleading outdated info. – Adam Spiers Jun 13 '19 at 11:47
0

Gnome's Evolution [a graphical mail user agent] has a built-in feature to remove duplicate mail. As explained on this help page, it boils down to:

  1. Select the suspect messages (or just all messages)
  2. Go to menu Messages, the choose Remove Duplicate Messages.

Voilà.

P.S. Evolution can access your messages locally (MailDir, MH, Mbox) or over IMAP.

Franklin Piat
  • 806
  • 8
  • 24