I'm trying to come up with a regular expression that will match open and close HTML tags in a text file. Any help at all would be great, all I've been able to come up with is <[^>]*>
, which was the most recommended regex for my purpose. I should mention that I am using VS 2010 and C#.
Asked
Active
Viewed 185 times
-1
-
3Careful: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Feb 01 '12 at 12:42
-
that was helpful dear my problem is nested html tag with unexpectable depth . do you your solution will solve the problem @Tichodroma – davmszd Feb 01 '12 at 13:15
-
2My "solution" will *definitely* solve your "problem". – Feb 01 '12 at 13:18
2 Answers
1
AFAIK it is impossible to find nested html tags using regular expressions. E.g. if the input is something like <b>some phrase <b>double bolded</b> another phrase</b>
, its impossible to match the correct opening and closing tags with regular expressions. Its possible if the levels of nesting are fixed and known, but since this is not the case in HTML, regex won't work.

Tarandeep Gill
- 1,506
- 18
- 34
-
1Depends upon what language you want to parse it in. XML parsers will work well. Almost all the languages have built in XML parsers, or easy to find libraries. – Tarandeep Gill Feb 01 '12 at 13:04
-
i have a html page which i convert it to text file in order to delete some tags of it with regex and again rewrite it exactly on the same text file with html language @Tarandeep Gill – davmszd Feb 01 '12 at 13:11
-
1html is not a programming language, you can not read/write/modify files with html. I am missing something here, either you are using JavaScript or maybe a command like "grep"? – Tarandeep Gill Feb 01 '12 at 13:17
-
@TarandeepGill, I think he means that he renames the `foo.html` file to `foo.txt`... – Qtax Feb 01 '12 at 13:34
-
@TarandeepGill You Can easily read html tag an write it on text file what was not clear ? my converting or some thing else ? – davmszd Feb 04 '12 at 09:13
1
Think you want the following. This includes an option for self-closing tags and closing tags
"</?[a-zA-z]* ?/?>"

Oliver
- 1,490
- 18
- 19
-
my problem is nested html tag with unexpectable depth . do you your solution will solve the problem @Oliver – davmszd Feb 01 '12 at 13:04
-
2@dav, use a HTML parser for such things, not regex. If you really want to use regex for such things, you can have a look at this regex: http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491 – Qtax Feb 01 '12 at 13:30