3

I'm working on Ruby on Rails 2.3.8 and I've got a website in which users type posts. Each of them has a short description that is shown in the main page. That description is automatically built from the original, but it's just truncated so it reaches a max of 240 characters.

The problem is those descriptions may contain images or videos, and I don't want them to appear when I truncate those strings. I'm using Hpricot plugin to parse HTML, and the following regular expression to parse images:

body = Hpricot.parse(html_body)
body = body.to_s.gsub(/<img .*?>/, '')

This is removing images, but sometimes it leaves a string instead, for example it says "image" or "img" where the image was before. Now, for example, I see a loose "spam" text remaining after I deleted an image from the description. Maybe the regex is not correct.

Does anybody know which is the right regex for removing images, and also videos from html?

shingara
  • 46,608
  • 11
  • 99
  • 105
Brian Roisentul
  • 4,590
  • 8
  • 53
  • 81
  • Why don't just avoid add some html in you content. After you can do some multi gsub to avoid what you want. – shingara Nov 30 '10 at 12:55
  • Avoid what? I do want users to insert images and videos, but those should be visible in the post's page, not in the short description on the home page. – Brian Roisentul Nov 30 '10 at 13:07

1 Answers1

2

It seemn go me that you are searching for img with a space after it.

Don't you want this so that you can grab the <img and everything up to but not including the > and then grab the >?

Hard to say if it works without source input.

<img([^>])+

CAUTION: will NOT work with nested tags.

Keng
  • 52,011
  • 32
  • 81
  • 111