I'm working on Ruby on Rails 2.3.8 and I've got a website in which users type posts. Each of them has a short description that is shown in the main page. That description is automatically built from the original, but it's just truncated so it reaches a max of 240 characters.
The problem is those descriptions may contain images or videos, and I don't want them to appear when I truncate those strings. I'm using Hpricot
plugin to parse HTML, and the following regular expression to parse images:
body = Hpricot.parse(html_body)
body = body.to_s.gsub(/<img .*?>/, '')
This is removing images, but sometimes it leaves a string instead, for example it says "image" or "img" where the image was before. Now, for example, I see a loose "spam" text remaining after I deleted an image from the description. Maybe the regex is not correct.
Does anybody know which is the right regex for removing images, and also videos from html?