6

Ever since my awful experience with cssparser, I have set myself the task of implementing a CSS parser in Java using Parboiled. I already have all color specification covered, but of course I need all the rest...

So, I went to look for the CSS specification and found it on the W3C website. I am now in the process of writing rules for all "atoms", but found something disturbing in this section:

UNICODE-RANGE   u\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?

The part that disturbs me is the question mark in [0-9a-f?].

The paragraph heading says the regular expressions used here are Lex-style. The ? has no special meaning in a character class (thanks @scizzo for the confirmation). So, is this a typo in the W3C specification, or is ? really allowed in a Unicode range? If yes, what does it mean?

Wrap up: I have my answer. However, the specification is wrong: a "question mark unicode range" can only be by itself. Given the above regex, this expression would be allowed whereas it is clearly illegal: u+4??-733f

Paul Sweatte
  • 24,148
  • 7
  • 127
  • 265
fge
  • 119,121
  • 33
  • 254
  • 329

1 Answers1

4

Yup, that's a literal question mark. From the Flex documentation:

Note that inside of a character class, all regular expression operators lose their special meaning except escape ('\') and the character class operators, '-', ']', and, at the beginning of the class, '^'.

Now, according to the W3C, ? can be used as a kind of wildcard:

? characters imply 'any digit value' (e.g. U+4??)

sczizzo
  • 3,196
  • 20
  • 28
  • This is one thing confirmed, which means the question mark must be somehow legal in a Unicode range... But I cannot find what it means :/ – fge Dec 28 '11 at 02:45