1

I saw this in a JavaScript regular expression that try to match id and class, so what does \u007F and \uFFFF match here?

var split = require('browser-split');
var tag = "#id.classname";
var classIdSplit = /([\.#]?[a-zA-Z0-9\u007F-\uFFFF_:-]+)/;
var tagParts = split(tag, classIdSplit);

I saw this in virtual-dom library, the author intend to use this to split

"#id.classname"

into

["", "#id", "", ".classname"]
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
wwayne
  • 145
  • 1
  • 11

2 Answers2

2

ID selectors have the syntax # immediately followed by identifier.

Class selectors have the syntax . immediately followed by identifier.

An identifier is defined as

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item).

Note: CSS3 allows identifiers to start with two hyphens

Therefore, that regular expression is an incorrect attempt to match # or . followed by an identifier.

Community
  • 1
  • 1
Oriol
  • 274,082
  • 63
  • 437
  • 513
1

It's an incorrect attempt to match the "Latin-1 Supplement" block of the Unicode Basic Multilingual Plane.

Correct would have been [\u0080-\u00FF].

Compare: http://kourge.net/projects/regexp-unicode-block

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Is it really the case? In Oriol's answer, it seems that the range U+0080 to U+009F is disallowed (which kinda makes sense, since the range contains control characters), and anything above U+00A0 is allowed. – nhahtdh Jun 25 '15 at 03:25
  • Yes, you should start at `\u00A0`, but that's one of those errors you can get away with virtually forever, given how unlikely it is that you'll run into any of those control characters in text that's supposed to be CSS. – Alan Moore Jun 25 '15 at 04:33
  • @AlanMoore: Putting the control characters aside, this answer ignores the rest of Unicode that is allowed in an identifier. Of course, they are rare - but as long as it has been taken into consideration, it should be done correctly. Otherwise, the regex can just stick to matching ASCII range characters. – nhahtdh Jun 25 '15 at 04:47
  • @nha I did not say anything about CSS identifiers, I said something about a Unicode block. :) – Tomalak Jun 25 '15 at 06:03
  • @Tomalak: I'm not sure how the Latin block would fit in this context. – nhahtdh Jun 25 '15 at 06:05
  • 1
    @nha I fits the title of the question (the CSS context was added later by the OP). In any case, you are right, I did not bother doing the "but CSS identifiers have more difficult rules than that" routine. – Tomalak Jun 25 '15 at 06:10