0

The issue that I am having is that when I download email messages from a Microsoft outlook webmail account, sometimes newline characters are disappearing, resulting in onelongunbrokenline. But only sometimes. Here is the example I am dealing with right now:

Original body of message being downloaded from Microsoft Outlook Web App (dollar signs included because I have :set list on in vim):

Gobble$
This is a message with$
Multiple lines$
$
Hello$

Body of message that I actually end up receiving (also has :set list on in vim):

GobbleThis is a message withMultiple lines^M$
Hello ^I^I ^I ^I^I =^M$

There are clearly a few other things going on here which I also don't understand - where are the tab (^I) characters coming from? Where is that equals sign coming from?

Here is the code that does the downloading (using the python library IMAPClient):

## Connect, login and select the INBOX
server = IMAPClient(HOST, use_uid=True, ssl=ssl)
server.login(USERNAME, PASSWORD)
select_info = server.select_folder('INBOX')

#Get messages since a certain time:
message_list = server.search(['SINCE %s' % cutoff.strftime('%d-%b-%Y')])
response = server.fetch(message_list, ['RFC822'])

for msgid, data in response.iteritems(): 
    msg_string = data['RFC822'].__str__()
    msg = email.message_from_string(msg_string)
    payload = msg.get_payload()
    body = payload
    print body
STN
  • 585
  • 2
  • 6
  • 15
  • 1
    Does the file have mixed line endings? It seems the answer is yes. Run dos2unix on the file before hand or try http://vim.wikia.com/wiki/File_format#Converting_mixed_files while in vim. (or fix the line endings in python) – FDinoff Jul 29 '14 at 17:49
  • 1
    As a side note, you almost never need (or want) to call `foo.__str__()`; just do `str(foo)`. – abarnert Jul 29 '14 at 18:35
  • @FDinoff: If you can get Python's universal-newlines behavior to do it for you, it's probably easier. I'm not sure if there's an easy way to do that in 2.x; maybe wrapping the string in an `io.StringIO` (not `StringIO.StringIO`) then pulling it out? But maybe someone else has a better memory than me. – abarnert Jul 29 '14 at 18:38
  • Anyway, is there any chance this message has attachments (or the MAPI associated-files equivalent), and Outlook is downloading an HTML or RTF version of the message and then plaintextifying, while your code is downloading the plaintext version? – abarnert Jul 29 '14 at 18:39
  • @abarnert good thought, but I don't think that's the case - I went to outlook, opened the message that is giving grief, clicked "view original" so I could see it without any fancy formatting, and then copy-pasted that into vim with ":set list" so that I could see if there were any characters missing or any funny characters added, and I didn't find any. It seems that the snafu is occurring during my download from outlook to my local computer, somehow. – STN Jul 29 '14 at 18:59
  • 1
    @STN: It may be worth checking whether Outlook is downloading via MAPI, WebDAV/OWA, or IMAP, and/or testing a different IMAP client. (For simple experiments, you can slap one together with `imaplib` from the Python stdlib, or you can just download any of the open source clients out there.) That will rule out anything on the other side of the line from your script. Next, you might want to try dumping out `data` as-is, before you re-parse it with `email`, to narrow down whether the problem is happening before or after that parse. – abarnert Jul 29 '14 at 19:24

0 Answers0