19

In C++11, this is now valid syntax:

vector<vector<float>> MyMatrix;

whereas previously, it had to be written like this (notice the space):

vector<vector<float> > MyMatrix;

My question is what is the fix that the standard uses to allow the first version?

Could it be as simply as making > a token instead of >>? If that's not it, what does not work with this approach?

I consider that forms like myTemplate< x>>3 > are a non-problem, since you can disambiguate them by doing myTemplate<(x>>3)>.

Norswap
  • 11,740
  • 12
  • 47
  • 60
  • 1
    `>` is *already* a token but the parser is and was greedy. The fix must therefore look different. – One possibility would of course be to make `>>` *not* be a token. – Konrad Rudolph Apr 03 '13 at 11:01
  • 2
    I guess you're looking for §14.2.3: *"When parsing a template-argument-list, the first non-nested > is taken as the ending delimiter rather than a greater-than operator. Similarly, the first non-nested >> is treated as two consecutive but distinct > tokens, the first of which is taken as the end of the template-argument-list and completes the template-id. "* – Zeta Apr 03 '13 at 11:02
  • 2
    "what is the fix that the standard uses to allow the first version" - I believe this has nothing to do with _The standard_. I mean - the implementation. I believe it's a compiler's decision how to implement this requirement, forced by _The standard_. – Kiril Kirov Apr 03 '13 at 11:03
  • 3
    @KirilKirov: the standard has changed the rules for tokenizing C++ source. From the POV of the authors of the standard, this is the "fix" that they made. It's up to the implementer how to write code to match the new (more context-sensitive) tokenizing rules. – Steve Jessop Apr 03 '13 at 11:24
  • @KonradRudolph see the rephrasing for my second question here: http://stackoverflow.com/questions/15785496#comment22443479_15785583 (comment on Mike Seymour's answer). – Norswap Apr 03 '13 at 11:31

1 Answers1

24

It's fixed by adding a special case to the parsing rules when parsing template arguments.

C++11 14.2/3: When parsing a template-argument-list, the first non-nested > is taken as the ending delimiter rather than a greater-than operator. Similarly, the first non-nested >> is treated as two consecutive but distinct > tokens, the first of which is taken as the end of the template-argument-list and completes the template-id.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • 2
    That looks like a defect. IIRC, the intent was that something like `template class X{}; X<(10 >> 2)>` (with the extra parentheses) would be legal. – James Kanze Apr 03 '13 at 11:21
  • 3
    @JamesKanze Isn' that covered by the "non-nested" part? – Norswap Apr 03 '13 at 11:27
  • Let me rephrase my second question: why didn't they drop the `>>` token, and simply use two `>`tokens everywhere a right-shift is required? This looks like a simpler fix. – Norswap Apr 03 '13 at 11:28
  • @Norswap: Who is "they", here: compiler or specification designers? The purpose of a spec. is to be consistent, the purpose of a compiler is to be efficient. Your formulation is good (and usable) only after you prove it does not contradict everything else (every *old use* of > >> >>> >_> ...). By the spec. designer, it was simpler to say "add this case" than "remove this one and replace it that way", letting to compiler implementer to find better equivalents alternatives (there may be many, your one can just be one of them) and prove for their equivalence. – Emilio Garavaglia Apr 03 '13 at 11:58
  • 3
    @Norswap I missed that part. In context, I would expect "nested" to refer to `<...>` type brackets only, but that doesn't really make sense, so it must be. – James Kanze Apr 03 '13 at 12:46
  • 1
    @Norswap That's an interesting idea. Currently no (unary or binary) operator consists of more than one token. In the larger context of expressions, however, something like `std::vector` consists of three tokens, but works like a single element. (I think using two tokens for a single operator would cause problems in a recursive descent parser. It would be interesting to knock up the grammar of C++ expressions in yacc, and see if changing it to use two '`>`' tokens as a single operator causes conflicts. – James Kanze Apr 03 '13 at 12:49
  • @EmilioGaravaglia The specification designers. To me, the formal grammar should be part of the spec, hence changing it is the task of the spec designers. The standard also specifies the list of tokens. I did not really understand the rest of what you said. What I suggest shouldn't pose any problem that I can see. Just replacing one token by two others matchings the sames characters in the grammar shouldn't cause breakage. – Norswap Apr 03 '13 at 13:11
  • @JamesKanze Same remark, should work. Why would it not specifically in a recursive descent parser? – Norswap Apr 03 '13 at 13:11
  • 1
    @Norswap: "What I suggest shouldn't pose any problem that I can see." But it's not enough that you (or anyone else) can't immediately see any problems. To change a fundamental aspect of the grammar (that an operator is a token), with potential ramifications throughout the language, you'd need to do a lot of work to prove that you haven't caused any problems. A self-contained special case is a lot easier to verify, even if it does feel ugly. – Mike Seymour Apr 03 '13 at 13:25
  • My point is that the change is not "situational". Any grammar rule that at one point accepts `>>` will still accept `>,>` (two tokens). Now the problem becomes, what if `>>,>` or `>,>>` appears in the expansion of a rule from the old grammar. Clearly, the first case is illegal (no operand to the shift). The second is not, but is not unreasonable to expect that the grammar will match an opening `<` to a closing `>`. Under those assumption, no problem could happen. This is of course not a formal proof, but could be made into one, no need to enumerate all the case. – Norswap Apr 03 '13 at 13:30
  • 1
    Because you have to see two tokens before deciding whether to reduce or to shift (to loop back to collect more, or to return). Note that the precedence of `>` and `>>` are different, so the choice cannot be made until you know which one you've got. And most parsers only use one token look-ahead. – James Kanze Apr 03 '13 at 17:19