0

I want to get just email text body from Gmail(imap servers in general) without need to download the entire message.

if I fetch for RC822, I can get everything just fine:

mail_box.fetch(message_ids, '(RFC822)')

But the problem is if I have too many messages and with attachments, it takes a lot of time.

I could get just the headers and text body I need with:

mail_box.fetch(message_ids, '(RFC822.HEADER BODY.PEEK[1])')

But this way I couldn't parse the text body, it has a weird format:

'\r\n------=_NextPart_001_0011_01CB63DF.D39BA1C0\r\nContent-Type: text/plain;\r\n\tcharset="iso-8859-1"\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nRafael, ...other content like html tags and css...------=_NextPart_001_0011_01CB63DF.D39BA1C0--\r\n'

Tried to parse it with email.message_from_string and quopri modules, but no luck so far.

Is it possible? To get messages formatted like RFC822 but without downloading attachments?

rafanunes
  • 70
  • 8
  • `------=_NextPart_001_0011_01CB63DF.D39BA1C0` hints on the message being a multipart message with a header field `Content-Type: multipart/....; boundary=----=_NextPart...`. Maybe you have to pass the `Content-Type` header to the function that decodes the body so it knows what to do with the NextPart-thing. some additional info: https://en.wikipedia.org/wiki/MIME#Multipart_messages – User Oct 24 '13 at 18:30
  • 1
    Either fetch and parse the BODYSTRUCTURE, and you can get just the part you need, or fetch the MIME part headers to go with 1. MIME messages are quite complex, and there's no way in IMAP to say "Just give the body" (as which one? HTML? Text? RTF? PDF?), so you can either guess, or you download the BODYSTRUCTURE and identify which part you actually want and fetch it, like 1.1 or 1.1.1.... – Max Oct 24 '13 at 19:45
  • I did it, I fetch first BODYSTRUCTURE and I found text plain message as 1(that's why I am fetching BODY.PEEK[1]). But fetching this, it comes with this weird header on the text string. I am parsing manually with slices and substring, just wondering if I have a better way. – rafanunes Oct 25 '13 at 22:27

1 Answers1

0

The correct way is to request the BODYSTRUCTURE of the message and only fetch the relevant part.

In the comments you are suggesting that you have already fetched the BODYSTRUCTURE and that the part 1 corresponds to a text/plain MIME part. Please show us the whole, unprocessed BODYSTRUCTURE; without it, one cannot tell whether the IMAP server you are using is buggy or whether your understanding of the BODYSTRUCTURE is wrong.

Jan Kundrát
  • 3,700
  • 1
  • 18
  • 29