0

I have some html string which can have an tag, like this:

<p> blablabla <img> an image</img> again blablabla</p>

I want to remove the image tag, and get the part before and after in a string array.

edit: After calling

String[] splitted = htmlStr.split("regex");

Result would be:

splitted[0] = "<p> blablabla ";
splitted[1] = "again blablabla</p>"

I'd say a regex is required, mind that img tag can be different from string to string: it can have one or more attributes for example.

Phate
  • 6,066
  • 15
  • 73
  • 138
  • 1
    [Use an HTML parser if your img tag will be arbitrary](http://stackoverflow.com/a/1732454/451590) – David B Oct 29 '12 at 12:22
  • I just have to clear the img part, for this reason I'd avoid using a whole html parser. @Roman: because String.split method requires a regex – Phate Oct 29 '12 at 12:26

4 Answers4

1

You should use an HTML Parser for parsing HTMLs, because your tags may vary, which can't be handled completely by Regex.

But, given for this case that you just want to remove the <img> tag, regardless of the attributes it has, you can use the below regex: -

String str = "<p> blablabla <img> an image</img> again <img href = sadf> " + 
             "asdf asdf </img>blablabla</p>";

str = str.replaceAll("<img\\s*[^>]*?>[^<]*?</img>", "");
System.out.println(str);

OUTPUT: -

<p> blablabla  again blablabla</p>

You would like to see the below link: -

You can rather use HTML parsers like: -

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
0

Use StringTokenizer, String.split() or an HTML parser for complex HTMLs with many IMG tags.

logoff
  • 3,347
  • 5
  • 41
  • 58
0

Try following code:

String str = "<p> blablabla <img> an image</img> again blablabla</p>";
int start = str.indexOf("<img");
int end = str.indexOf("</img>");
String imgTagValue = str.substring(0,start) + str.substring(end, str.length());

However, if in a single line more than <img> tags are used, it should be parsed appropriately.

Refer here.

Azodious
  • 13,752
  • 1
  • 36
  • 71
  • It works if after the "int end.." line I make substrings using the indexes start and end, but I think that just a String.split with a good regex would be more efficient... – Phate Oct 29 '12 at 12:30
0

If you want to remove all html-tags you can use this code:

string = string.replaceAll("\\<.*?\\>", "");
D-32
  • 3,245
  • 2
  • 22
  • 33