1

I'm hosting my own linux mail server for my family. Yesterday my father lost all the mails in his Inbox folder. I'm still not sure whether it's due to a terrible user error or compromised password, but that's not the point here. Thanks to Murphy's law, I also had no backup (don't shoot, I created one just after) and I feel terribly bad for him. So my only option left is trying to recover the deleted emails from the partition.

I immediately took an image of the whole ext4 data partition on the server with "dd", and now I have an archive of several hundreds GB to deal with, which feels like a giant haystack. I'm wondering what is the best way to extract the emails from this image? I know the mails are there somewhere because when I grep for my dad's email, I got lots of matches like "To: dad@mydomain.com", and with -C option I see the other usual SMTP headers (From, Subject, Date, Message-Id, ...).

I first tried "foremost" with a custom format, but since a mail doesn't have a fixed size the results were not conclusive.

I also tried https://pypi.org/project/mail-parser/ but it seems it would need patching to do what I want (it expects a text file with just a mail in it, not a big raw file with lots of mails in it).

Do you know any other (free) tool or method to reconstruct the email files from this ext4 image with reasonable accuracy? Like explained, the tricky part is that unlike images or other formats, the mails are stored in plain text and don't contain directly the size, so I think this tool will have to be rfc822 aware at some point to do the parsing/extraction.

piwai
  • 11
  • 5
  • Some mail formats do store the size. What was the mail program used? – Law29 Dec 02 '18 at 01:42
  • Since it was an ext4 file system, have you investigated `extundelete` or `ext4magic`? – Law29 Dec 02 '18 at 01:43
  • All the mails were stored in an IMAP folder using maildir format (one file per mail). Years ago I used qmail as a MTA, then switched to postfix + dovecot. From what I can see looking at my own emails, there is no simple "content-length" or equivalent header that could be used to get the size. – piwai Dec 02 '18 at 07:26
  • For extundelete, not yet. I will give it a try. – piwai Dec 02 '18 at 07:27
  • Unfortunately, neither extundelete nor ext4magic could find anything to restore :-( And yet the lost mails (at least part of them) are there somewhere because like explained, grep find matches... I will try to develop a small python script based on email parser lib to see if I can get better results. I will keep you posted – piwai Dec 04 '18 at 17:49
  • 1
    I described my own recovery attempt here: https://github.com/piwai/mail-recovery. This is a work in progress, I don't know yet if it will work. – piwai Dec 08 '18 at 17:46
  • Awesome work! You can make this an answer to your question! – Law29 Dec 09 '18 at 17:20

1 Answers1

0

Well, It took me a few hours and a bit of Python scripting, but it finally worked! I was able to recover all of my dad's lost emails.

The whole procedure and Python scripts I used are here: https://github.com/piwai/mail-recovery In short, what I did was:

  • Take an image of the partition with dd
  • Use foremost to detect SMTP header inside the dd image
  • Parse foremost audit file to extract chunks of data containing emails
  • Filter the chunks to keep only the deleted one
  • Filter again to remove the duplicates
piwai
  • 11
  • 5