24

I want to grab an img tag from text returned from JSON data like that. I want to grab this from a string:

<img class="img" src="https://fbcdn-photos-c-a.akamaihd.net/hphotos-ak-frc3/1239478_598075296936250_1910331324_s.jpg" alt="" />

What is the regular expression I must use to match it?

I used the following, but it is not working.

"<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>"
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
eng.ahmed
  • 905
  • 4
  • 16
  • 38
  • 4
    http://stackoverflow.com/a/1732454/775544 – Anthony Neace Sep 06 '13 at 19:16
  • Please don't parse HTML with regex. HTML is not a regular language. – thegrinner Sep 06 '13 at 19:38
  • **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Sep 06 '13 at 19:41
  • If i am want to get all the attributes (title, src, alt), what are the modifications needed with the regex pattern ]*src="([^"]*)"[^>]*>. Thanks in advance. – raj Jan 21 '16 at 14:04

4 Answers4

26

You could simply use this expression to match an img tag as in the example :

<img([\w\W]+?)/>
aleroot
  • 71,077
  • 30
  • 176
  • 213
  • OP didn't say what he wanted to capture but this captures the class, src, and alt tag. – hwnd Sep 06 '13 at 19:37
  • @hwnd yes, I know. But as you said he haven't specified what he wants to capture . – aleroot Sep 06 '13 at 19:39
  • Html can't really be parsed effectively with regex, adding some granularity in the expression betters the odds though. –  Sep 06 '13 at 20:15
  • 7
    regexr.com complains that closing slash needs to be closed AND closing slash itself is optional, depending on is it HTML or XHTML. Better way would be: ``, what do you think? – revelt Nov 24 '16 at 06:28
20

Your regex doesn't match the string, because it's missing the closing /.

Edit - No, the / is not necessary, so your regex should have worked. But you can relax it a bit like below.

Slightly modified:

 <img\s[^>]*?src\s*=\s*['\"]([^'\"]*?)['\"][^>]*?>
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
10

Please note you shouldn't use regular expressions to parse HTML for the various reasons

<img\s+[^>]*src="([^"]*)"[^>]*>

Or use Jsoup...

String html = "<img class=\"img\" src=\"https://fbcdn-photos-c-a.akamaihd.net/
               hphotos-ak-frc3/1239478_598075296936250_1910331324_s.jpg\" alt=\"\" />";

Document doc = Jsoup.parse(html);
Element img = doc.select("img").first();
String src = img.attr("src");

System.out.println(src);
hwnd
  • 69,796
  • 4
  • 95
  • 132
0

I face the same situation and I tried this and it worked for me.

(<img)[^/>]*(/>|>)

Here is the explanation:

Image for the explanation of above regex

This explanation is from the website https://extendsclass.com/regex-tester.html

m4n0
  • 29,823
  • 27
  • 76
  • 89