2

i would love to get rid of some tiny images in a rssfeed by matching and removing them with Apples NSRegularExpressions.

<img src="somepic" height="1" width="1"> should be matched for removal
<img src="somepic" height="50" width="100"> -> should also be matched
<img src="somepic" height="100" width="100"> -> this one should not be matched

my current approach is not working yet

<img(\s*[height|width]\s*=\s*"([0-9]|[1-9][0-9])"\s*+|[^>]+?)*>

my guess is there is some problem with the capture groups (which are probably not needed at all). Does anyone have a hint why its not working?

HolyMac
  • 57
  • 1
  • 6
  • 1
    The problem is that *all* `img` attributes must match the first capture group, which fails for the `src` attribute. You need to make sure your check only applies to `width` and `height` attributes, but leave other attributes such as `src` alone. (Also, `[height|width]` should be something like `(height|width)`.) – Mattias Buelens Sep 17 '12 at 15:42

2 Answers2

2

Try this regex:

<img[^>]*(?:height|width)\s*=\s*"[1-9]?[0-9]"[^>]*>

It fixes the small issues you had, that Mattias Buelens mentioned in his comment.

See on rubular.

morja
  • 8,297
  • 2
  • 39
  • 59
  • Thanks Morja. I added the handling for px and em in the height, width attributes. http://www.rubular.com/r/OYcuO8cwzr ]*(?:height|width)\s*=\s*"[1-9]?[0-9](px)*(em)*"[^>]*> – Mohd Farid Apr 21 '13 at 09:21
1

This is in c# regex

(?<=<img).*?(height="([0-9]|[1-9][0-9])".*?width="([0-9]|[1-9][0-9])"|width="([0-9]|[1-9][0-9])".*?height="([0-9]|[1-9][0-9])").*?(?=>)

Hope this helps..

Anirudha
  • 32,393
  • 7
  • 68
  • 89