1

We are displaying HTML body extracted from .MSG files exported from Outlook.

To display the HTML body, one needs to decompress RTF from PR_RTF_Compressed field and then decode the RTF to HTML (outlook actually encodes HTML to RTF when exporting MSG files). We are using RDO library to parse the msg files and extract the HTML body.

RDO produces some HTML that is not always the same as Outlook displays (text size sometimes does not match etc.)

Is anybody aware of an implementation of HTML body extraction that would most closely match the appearance of HTML displayed by Outlook or is this impossible?

Marek
  • 10,307
  • 8
  • 70
  • 106
  • Are you sure the html->rtf conversion is even lossless? – CodesInChaos Nov 27 '10 at 10:46
  • It is irrelevant whether it is lossless. Outlook parses the exact data that is contained in the .msg that we have available. The requirement is to match what Outlook displays when opening the same .msg file 1:1. – Marek Nov 28 '10 at 08:26
  • @Marek how to extract HTML file from .MSG file, which tools are you used, could you help me in http://stackoverflow.com/questions/26095381/how-to-extract-html-from-m-msg-file-on-linux-os-x – Yuan He Sep 29 '14 at 08:20

1 Answers1

0

more thoughts than an answer...

Are you displaying the extracted body in a browser such as IE?
I expect that the issue is that Outlook (2007) uses the Word rendering engine to display HTML while browsers use their own. So, I don't think you are likely to find an extraction implementation that will help.
Can you apply a stylesheet to your extracted body document, that will override most of the inconsistencies?

CMH
  • 222
  • 2
  • 6
  • Thanks for comment. Yes, we are displaying that using IE (WebBrowser control). We can also apply a style sheet. But the HTML we get from RDO contains a lot of inline styles. We will try to focus on finding a method how to match styles rendered by word in IE. Thanks! – Marek Dec 17 '09 at 13:14