0
import email
import email.policy
import sys
msg = email.message_from_string(sys.stdin.read(), policy=email.policy.default)
print(msg.get_body('plain').get_payload())

input.eml

MIME-Version: 1.0
From: my from
To: email@to.com
Subject: my subject
Content-Type: multipart/mixed;
 boundary="----=_Part_2296279_969698842.1679155313994"
Date: Sat, 18 Mar 2023 16:01:53 +0000 (UTC)

------=_Part_2296279_969698842.1679155313994
Content-Type: multipart/alternative;
 boundary="----=_Part_2296278_601255348.1679155313994"

------=_Part_2296278_601255348.1679155313994
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

email app=2E this is a test =2E
------=_Part_2296278_601255348.1679155313994
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

<!DOCTYPE html>
<html>
<body>
this is html
</body>
</html>
------=_Part_2296278_601255348.1679155313994--

------=_Part_2296279_969698842.1679155313994--

get_body() shows things like =2E. Is it a dot? How to automatically convert such escaped strings to the actual characters?

$ ./main.py < input.eml
email app=2E this is a test =2E
user1424739
  • 11,937
  • 17
  • 63
  • 152
  • 1
    You're dealing with "quoted printable" text. This is an encoded format that escapes certain characters to make them easier to package into an email, for example. You can use the `quopri` Python module to decode it. Use `quopri.decodestring(text)`, where `text` is your quoted printable text.. – Tom Karzes Mar 19 '23 at 04:42
  • get_body() does not tell me it is quoted-printable. How to make a function that is the same as get_body() except when the content is quoted-printable (must be automatically derived from the input message), then call quopri.decodestring(text)? – user1424739 Mar 19 '23 at 04:55
  • Look at this field: `Content-Transfer-Encoding: quoted-printable` That tells you it's quoted printable. – Tom Karzes Mar 19 '23 at 05:46
  • I know. But the question is how to do it in a program. I don't want to manually check the original input string with a regex. The solution has to be in the email module. For example, is there a member function of email.message.EmailMessage that can tell it is quoted-printable? – user1424739 Mar 19 '23 at 14:09

1 Answers1

0

The solution is to provide decode=True to get_payload().

msg.get_body('plain').get_payload(decode=True).decode('utf-8')

I have to say the default option chosen by the author of get_payload() is poor. This is not the first time that I find that python's API is poorly designed.

user1424739
  • 11,937
  • 17
  • 63
  • 152