1

I have a requirement where I need to modify html 'img' tags in an html string that do not end with a '/>' ex: <img src=""> needs to be changed to <img src=""/> I am using following regex: <img(.*[^/])> to replace with <img$1/>

This works fine however for cases like: <center><img src=""/></center> the regex returns: <center><img src=""></center/>

Any suggestions how to impact this regex only upto the end of the img tag? Thanks.

  • 4
    [Don't parse HTML with regex!](http://stackoverflow.com/a/1732454/418066) – Biffen Mar 06 '17 at 09:28
  • 1
    Possible duplicate of [Using regular expressions to parse HTML: why not?](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) – baao Mar 06 '17 at 09:29
  • Try using ``. This makes the regex non-greedy – Psi Mar 06 '17 at 09:29
  • @Biffen its actually a html string that I am applying regex on. The servlet response is the content string that has these img tags which I need to modify. – Shubhankit Roy Mar 06 '17 at 09:31
  • @ShubhankitRoy What do you think HTML is if not a string?! How is that any different? You still can't parse HTML with regex. – Biffen Mar 06 '17 at 09:32
  • @Psi Already tried. Didn't work. Updated the description, please check. – Shubhankit Roy Mar 06 '17 at 09:36
  • @ShubhankitRoy have you checked my answer ? – Mustofa Rizwan Mar 06 '17 at 10:05
  • 1
    @RizwanM.Tuman Nope, I guess I'll have to use a html parser. Thanks though. – Shubhankit Roy Mar 06 '17 at 10:48
  • 1
    What language are you using? RegEx is actually the wrong tool for this – SierraOscar Mar 06 '17 at 11:44
  • @MacroMan I am using Java. However, I did a workaround for this requirement. As I just needed to alter the mentioned small portion of html, I got all my matching image tags using `` and then applied some logic on the captured group to properly modify the tags. – Shubhankit Roy Mar 08 '17 at 12:51

1 Answers1

0

You may use this:

<\s*img\s+([^>]*=(?:\".*?\"|\'.*?\'))[\s\w\-]*>

with following replace by:

<img $1/>

this will match these simple and complex cases:

<img src="images/a.jpg" title="test"><br/>
<img  src="a/b.jpg" >
<span><img src="a.jpg"></span>
<img src="" title="">
<img src="" data-val>
<img src="a.jpg" title="a'>b">
<img src="a.jpg" title='a">b'>
<img src="a.jpg" title='a>=b"=>' >

but not following:

<img src="a.jpg" />
<imgXTag src="b.jpg" >
<img src="a.jpg" /  >

Sample Demo

S.Serpooshan
  • 7,608
  • 4
  • 33
  • 61