0

I would like figure out whether a tag has closing tag in place or not .I am not parsing the complete html content,So i donot think going for some library makes any sense here. Would like to have some kind of regex which will return me false when the tag is not closed and return true when its . Ex:

<span style="font-size:10pt;font-family:Arial;color:#000000;" class="left">Hi</span> Result :true
<span style="font-size:10pt;font-family:Arial;color:#000000;" class="left"><span style="font-weight:normal;font-style:normal;vertical-align:top;">Result :false

<span style="font-size:10pt;font-family:Arial;color:#000000;" class="left"><span style="font-weight:normal;font-style:normal;vertical-align:top;"><img border="0" src="file://locn/Smileys/EmoticonWink.gif" alt="wink" title="wink" keybrd=";)" width="18" height="18" src_data="file://locn/Smileys/EmoticonWink.gif" /></span>Result :true

I have tried below regex but it does not detect the difference an

String HTML_TAG_PATTERN = "<(\"[^\"]*\"|'[^']*'|[^'\">])*>(.*<(\"[^\"]*\"|'[^']*'|[^'\">])*>)?";



import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HtmlTagValidator{

   private Pattern pattern;
   private Matcher matcher;

   private static final String HTML_TAG_FORMAT_PATTERN = "<(\"[^\"]*\"|'[^']*'|[^'\">])*>";
   private static final String HTML_TAG_PATTERN = "<(\"[^\"]*\"|'[^']*'|[^'\">])*>(.*<(\"[^\"]*\"|'[^']*'|[^'\">])*>)?";

   public HtmlTagValidator(){
      pattern = Pattern.compile(HTML_TAG_PATTERN);
   }


  public boolean validate(final String tag){

      matcher = pattern.matcher(tag);
      return matcher.matches();

  }
}

Any clue or suggest if there is a better way to achieve this.

Rajesh Kumar Dash
  • 2,203
  • 6
  • 28
  • 57
  • What about adding before your taga and after your tag, and then using a library to evaluate it? – Stefan Feb 05 '19 at 07:44
  • Donot want to add a third party library as its just tag validation. – Rajesh Kumar Dash Feb 05 '19 at 07:46
  • 3
    Regex are a solution for html/xml parsing only in the extreme case of predictability in which you can practically list all possible inputs. https://stackoverflow.com/a/1732454/7733418 – Yunnosch Feb 05 '19 at 07:59

1 Answers1

0

This might help you:

(\<\w*)((\s\/\>)|(.*\<\/\w*\>))

But I suggest you to use third party libraries for validating and parsing of html.

Mohsen
  • 4,536
  • 2
  • 27
  • 49