1

I am trying to extract the readable portion of the Body.Text property of a TIDMessagePart object that is type TIDText. Something like the code below. However if ContentType of the TIDText message part is not text/plain, but is rather text/html, this fills sBody with all the HTML tags. I just want the readable text, but don't see a way to get that in the version 9 library. Am I missing something?

var email: TIDMessage; sBody: String;

...

for j := 0 to Pred(email.MessageParts.Count) do
begin
if email.MessageParts.Items[j] is TIdText then
begin
    sBody := TIdText(email.MessageParts.Items[j]).Body.Text;
end;
end;
RRUZ
  • 134,889
  • 20
  • 356
  • 483

1 Answers1

2

You have to manually parse the HTML to extract the plain text you want from it. TIdMessage is just an email container of data, it does not parse body content for you, other than to deal with charset conversions. You have to parse the content yourself.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • This remains true for INDY 10 and I expect to remains the same for future INDY versions. – jachguate Jan 18 '13 at 23:06
  • 1
    True, there are no plans to implement a full HTML parser in Indy (there are plenty of third-party parsers available for that), however Indy 10 does have a small HTML parser in the `ParseMetaHTTPEquiv()` function of the `IdGlobalProtocols` unit, which `TIdHTTP` uses for parsing `` tags from HTML data. – Remy Lebeau Jan 18 '13 at 23:21