9

Consider the following demonstrative program.

#include <iostream>

int main()
{
    typedef float T;

    0.f.T::~T();
}

This program is compiled by Microsoft Visual Studio Community 2019.

But clang and gcc issue an error like this

prog.cc:7:5: error: unable to find numeric literal operator 'operator""f.T'
    7 |     0.f.T::~T();
      |     ^~~~~

If to write the expression like ( 0.f ).T::~T() then all three compilers compile the program.

So a question arises: is this record 0.f.T::~T() syntactically valid? And if not, then what syntactical rule is broken?

Jeff Linahan
  • 3,775
  • 5
  • 37
  • 56
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335

2 Answers2

3

The parsing of numerical tokens is quite crude, and allows many things that aren't actually valid numbers. In C++98, the grammar for a "preprocessing number", found in [lex.ppnumber], is

pp-number:
    digit
    . digit
    pp-number digit
    pp-number nondigit
    pp-number e sign
    pp-number E sign
    pp-number .

Here, a "nondigit" is any character that can be used in an identifier, other than digits, and a "sign" is either + or -. Later standards would expand the definition to allow single quotes (C++14), and sequences of the form p-, p+, P-, P+ (C++17).

The upshot is that, in any version of the standard, while a preprocessing number is required to start with a digit, or a period followed by a digit, after that an arbitrary sequence of digits, letters, and periods may follow. Using the maximal munch rule, it follows that 0.f.T::~T(); is required to be tokenized as 0.f.T :: ~ T ( ) ;, even though 0.f.T isn't a valid numerical token.

Thus, the code is not syntactically valid.

Eric M Schmidt
  • 784
  • 1
  • 6
  • 15
  • Interestingly, there's actually an example with decent similarity in [lex.pptoken]: http://eel.is/c++draft/lex.pptoken#5 – chris Apr 18 '20 at 02:00
1

A user defined literal suffix, ud-suffix, is an identifier. An identifier is a sequence of letters (including some non-ASCII characters), the underscore, and numbers that does not start with a number. The period character is not included.

Therefore it is a compiler bug as it is treating the non-identifier sequence f.T as an identifier.

The 0. is a fractional-constant, which can be followed by an optional exponent, then either a ud-suffix (for a user defined literal) or a floating-point-suffix (one of fFlL). The f can be considered a ud-suffx as well, but since it matches another literal type it should be that and not the UDL. A ud-suffix is defined in the grammar as an identifier.

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
  • Why is it interpreted as a ud-suffix? – Vlad from Moscow Apr 17 '20 at 18:37
  • @VladfromMoscow The `0.` is a _fractional-constant_. That can be followed by (excluding the exponent stuff) a _ud-suffix_ (for a user defined literal) or a _floating-point-suffix_ (one of `fFlL`). The `f` can be considered a _ud-suffx_ as well, but since it matches another literal type it should be that and not the UDL. A _ud-suffix_ is defined in the grammar as an _identifier_. – 1201ProgramAlarm Apr 17 '20 at 18:48
  • @1201ProgramAlarm: Whereas `f` can be interpreted as ud-suffix, `f.T` should not as `.`cannot be in identifier. but it is... I would say compiler bug but pretty sure it is more complicated. – Jarod42 Apr 17 '20 at 19:00