0

I am able to read emails in from Microsoft Exchange using an IMAP Client from Lumisoft. I have set the exchange server settings to convert any mail to plain text. However, when I read in the information it still seems to contain HTML/CSS.

What is the best way of removing HTML/CSS from the body of an email? Or is there a setting on the exchange server I seemed to have missed?

Esteban Küber
  • 36,388
  • 15
  • 79
  • 97
James
  • 80,725
  • 18
  • 167
  • 237
  • 2
    Are you looking for a code solution or an Exchange Setting of some kind? – Jose Basilio Apr 26 '09 at 18:18
  • Any solution would help. As a work-around I am using a regular expression to remove any HTML tags, however this does not remove all the CSS. An Exchange Setting would be ideal but I have tried the settings i.e. setting IMAP to only provide email in plain text. However, it doesnt seem to work when I read the emails. – James Apr 26 '09 at 20:44
  • Might I suggest updating the title to reflect that the question specifically is about exchange mail server interaction, and not a generic "how do I a convert html to plain text" question. – hlovdal Apr 26 '09 at 23:15
  • Hi the reason it is generic is because I am looking for any solution to suit the problem. I am not specifically looking for an exchange setting I am looking for any alternative for extracting the plain text body from an email. – James Apr 27 '09 at 07:34

2 Answers2

1

I usually take one of these approaches...

  1. Using regular expressions. It can be a bit difficult to get right if you have to come up with a solution that also works with all kinds of invalid markup, but i bet someone else has done it before you (Hint: google or search SO).

  2. Using an HTML parser library. You can find one for any popular programming language out there. I recommend using the Html Agility Pack.

  • Hi, at the minute I am using a regular expression that I created myself and it only strips out the HTML (which leaves the CSS) I don't feel 100% comfortable using this approach tho. I would ideally like an exchange server setting that would definitively convert any mail I receive to a specific mailbox as plain text. I tried setting the IMAP settings for the mailbox to plain text only.....it worked for a while and then has all of a sudden stopped! – James May 21 '09 at 09:37
  • Decided to go with the HtmlAgilityPack library. – James Nov 06 '09 at 10:38
0

I'm not sure of exactly how your setup works, if you can run scripts, etc. An HTML parser would be the best way to parse the HTML, obviously. For instance, with Hpricot (a Ruby HTML-parsing library), you could do puts doc.find_element('body').inner_text and that would print the text content of the document.

Chuck
  • 234,037
  • 30
  • 302
  • 389
  • Hi this pretty much sounds like a solution I could use. How and where would I run a script like this? – James Apr 27 '09 at 07:35
  • The link for Hpricot is http://wiki.github.com/why/hpricot. You will need the Ruby programming language to run it http://www.ruby-lang.org/en/. – airportyh Apr 27 '09 at 13:32
  • Hi, I have decided against this method as I don't really have a lot of experience with Ruby. – James May 04 '09 at 15:38