1

I am new to regexps, can someone help me in getting a regex for parsing the tag

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 

with all the possiblities?

VLAZ
  • 26,331
  • 9
  • 49
  • 67
vaibhav
  • 3,929
  • 8
  • 45
  • 81
  • 4
    **All** the possibilities? Don't try to do this with a regular expression. You can get away with them for HTML that fits a template, but for generic parsing, you need a real HTML parser. – Quentin Jan 21 '11 at 20:42
  • 3
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Reid Jan 21 '11 at 20:43
  • @David @Reid Since – phihag Jan 21 '11 at 20:44
  • 1
    A meta that specifies the character encoding *should* only occur once. – Quentin Jan 21 '11 at 20:45
  • @David Dorward Precisely, and regular expressions are therefore as applicable as searching for the string " – phihag Jan 21 '11 at 20:46
  • My point was that "should" and "will" are not the same thing, I have stumbled across documents with multiple, contradictory encoding declarations. (And that's just limiting this to ``, if you actually want to figure out what the encoding of a document is, then there are half a dozen places you need to look.) – Quentin Jan 21 '11 at 20:47
  • 1
    @phihag still, there are endless variations in case, in attribute order, in other attributes.... – Pekka Jan 21 '11 at 20:48

1 Answers1

2

To cover "all the possibilities", you really should be using HTML 5's Determining the character encoding rules. These aren't expressible as a regular expression.

There is an open source Java implementation of it in validator.nu


If you insist on using a regular expression, then this will probably cover most cases where the encoding it declared with a meta element (it won't, for instance, cover XML declarations). It is however, dirty, makes some assumptions that are usually (but may not always be) right and I do not recommend it.

/<meta[^>]+charset=['"]?(.*?)['"]?[\/\s>]/i
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335