0

I am trying to remove an unwanted character > appearing in the "From " line in the headers of some old archived emails such as ">From" and am unable to do so by rewriting the From line using the Procmail recipe

Error reproduced:

>From "xxxx@example.com" Sat Dec  4 11:01:29 2004
Status: RO
From: "xxxxxx" <xxxx@example.com>
Subject: Desktop Alert Utility
To: 'bbbb@example.com'; 'dddd@example.com'
Date: Sat, 04 Dec 2004 05:31:29 +0000
MIME-Version: 1.0
Content-Type: multipart/mixed;
    boundary="--boundary-LibPST-iamunique-1531497257_-_-"

The following does not work:

:0 fhw
| formail -I">From " -a"From "

Even the following does not work:

:0 fhw
| formail -I">From "

What am I doing wrong? Will be happy to share any relevant information.

Note: Due to the unnecessary > before From in the first line of the email header, the mail client shows the email as with "No sender" and does not show other details in the summary view. It shows the whole message in the body.

I also tried

LC_ALL=C find . -type f -name ‘*.*’ -exec sed -i '' s/'>From'/'From'/ {} +

but it did not return the result needed.

I am running macOS Mojave.


New note: While my original question is answered below, the extended discussion of applying sed to achieve results have led to a new question at the link below:

Removing unwanted character from the first line of files in a “maildir”

  • The LibPST fragment in the MME boundary looks like you converted this from an Outlook PST file? If so, I'm afraid maybe there is no sane way to get the real data back even if you solve this particular problem. – tripleee Jan 14 '21 at 16:37
  • Yes, your observation is spot on regarding converting from Outlook PST using libpst even prior to archiving. There are a small number of emails with this problem and I think the problem does not come from the PST or the conversion, it may have originated at the original mail server used to send/receive the emails. I do not have access to the original server now after so many years. Once the emails have been restored, I will confirm if the real data has been retrieved. Thank you for your help. – Ray ISUNSI Jan 14 '21 at 19:15

1 Answers1

0

> is not syntactically a valid header character, so I doubt you can persuade formail to treat it as one.

Try writing a simple sed or Awk script to escape it instead.

If the >From is always the first line of each file, try

sed -i '' '1s/^>From/From/' *

and if the files are not all in the current directory, maybe wrap that with

find . -type d -execdir sh -c 'sed -i "" "1s/^>From/From/" *' \;

to run it on all the subdirectories of the current directory.

This assumes the file names will all fit on a single command line; if you get "Argument list too long", try

printf '%s\n' * | xargs sed -i '' '1s/^>From/From/'

or with find, try

find . -type f -exec sed -i '' '1s/^>From/From/' {} +

The printf variant is slightly brittle; if you can't get it to work because you have irregular file names with newlines in them etc, the find solution should not be hard to adapt to run in the current directory only (add -maxdepth 1 to prevent it from traversing subdirectories).

In brief, some email servers will change every From at the beginning of a line in the body of a message into >From (or, with quoted-printable MIME encoding, =46rom; but this should be transparently converted back for display purposes when you view the message with a proper MIME client) - I'm guessing you have forwarded the entire mailbox inlined into a text/plain message so perhaps the easiest fix is to send it again from the original source, this time wrapped into a suitable MIME container so that it won't be mangled in transport (maybe wrap it into a .tar.gz and add that as a binary attachment).

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I tried the edited commands and got this error: sed: RE error: illegal byte sequence. Also checked to see that the original issue (removing the >) is not solved perhaps due to the RE error. Is it due to the OS being MacOS? – Ray ISUNSI Jan 14 '21 at 19:28
  • `sed -i '' '1s/^>From/From/'` works for me on macOS without issues. Did you copy/paste from here or type the text manually? The error message sounds vaguely like you replaced one of the ASCII characters with some invalid Latin-1 sequence or something, but I don't exactly understand how that could happen. – tripleee Jan 14 '21 at 19:34
  • I have now successfully copy/pasted and ran all of the commands from here. My system is Catalina actually, but I have a hard time imagining that this would have changed since Mojave, which I used to run before. – tripleee Jan 14 '21 at 19:37
  • This is not Mojave of course, but here is a quick demo: https://ideone.com/jtv6Eh Maybe you can fork it and supply one or two *actual* files of yours in `one.mbox` and `two.mbox` to verify that it actually works, or show how it doesn't? – tripleee Jan 14 '21 at 20:06
  • I tried: LC_ALL=C && find . -type f -exec sed -i '' '1s/>From/From/' {} + and received no errors in the terminal but when I import these in to the IMAP I get some errors such as: "doveadm(bucket): Error: zlib.read(/Volumes/SSD/.error/cur/1610649281.M38111P35842.mojo.local,S=21752,W=22074:2,): missing gz header at 21752 doveadm(bucket): Error: Mailbox INBOX.INBOX: Saving mail: save: read(zlib(/Volumes/SSD/.error/cur/1610649281.M38111P35842.mojo.local,S=21752,W=22074:2,)) failed: – Ray ISUNSI Jan 14 '21 at 20:14
  • If you are importing them into Dovecot, you should probably remove the `From ` line (space after `From`, not a colon) entirely. The `sed` command as such worked, I assume; perhaps (accept this answer, or post one of your own and accept that, and) post a new question about this fundamentally unrelated problem. (Not a programming question, but maybe our sibling site [unix.se] would be happy to accept it.) – tripleee Jan 14 '21 at 20:15
  • The error message looks like it expected `.gz` files and you gave it files which were not gzipped, actually. – tripleee Jan 14 '21 at 20:17
  • Please have a look at https://ideone.com/0iHKuc and comment. It seems the sed removes more than the expected lines in the header or maybe I am doing something wrong – Ray ISUNSI Jan 14 '21 at 20:28
  • The flaw in that is that `tail` without options only shows the last ten lines. The script actually does exactly what it should; see https://ideone.com/WNXaMx which is slightly more elaborate. – tripleee Jan 14 '21 at 20:44