0

In <mt:EntryBody> I have couple of images and caption imbedded in the entry.
I want to strip out all the html for publishing in rss.

Here is my entry formatting:

<img src="/path/to/img.jpg">
<div style="text-align:right">Image Caption</div>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse</p>

If I do this:

<mt:EntryBody remove_html="1">

This strips out all HTML elements with EntryBody but I would also like to take out Image Caption part because it look weird without referencing image.

How do I accomplish this?

Maca
  • 1,659
  • 3
  • 18
  • 42
  • Please read this SO question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Abe Miessler Nov 09 '10 at 00:27

2 Answers2

2

If you are using MT 4/5 Pro the easiest way to handle this is to have your image and caption in custom fields, then you can selectively output them into the appropriate templates. If it is in the content then something like this will be quite difficult, even with regex as Abe Miessler pointed out.

akamike
  • 2,148
  • 13
  • 16
0

Prevailing wisdom says that you should not use regex to parse HTML. Could you convert it to XHTML and then use xslt/xpath to do what you want instead?

If you can, take a look at:

Community
  • 1
  • 1
Abe Miessler
  • 82,532
  • 99
  • 305
  • 486