3

This is a simple question about the terminology.

C11, 6.4.2.1p1:

nondigit: one of
  _ a b c d e f g h i j k l m
    n o p q r s t u v w x y z
    A B C D E F G H I J K L M
    N O P Q R S T U V W X Y Z

The "nondigit" name is confusing because, for example, the $ is also a non-digit.

Does anyone know why (i.e the rationale) was it named "nondigit" instead of "letter-or-underscore"?

pmor
  • 5,392
  • 4
  • 17
  • 36
  • 2
    Maybe because _letter-or-underscore_ is longer to type than _non_digit_? – Jabberwocky Apr 28 '23 at 09:20
  • 1
    *"The "nondigit" name is confusing because, for example, the $ is also a non-digit."* ===> But that is not part of the character set. (Until C23 perhaps) – Harith Apr 28 '23 at 09:23
  • 2
    @klutt *I don't understand the downvotes here.* Aside for asking for for a subjective "Why?", likely because the question omits the [context](https://port70.net/~nsz/c/c11/n1570.html#6.4.2) of the use of `nondigit` as part of a formal syntax specification. That context shows a clear grouping of characters into **digits** and, well, non-digits. – Andrew Henle Apr 28 '23 at 11:10

2 Answers2

4

For naming identifiers, an implementation is required to support digits 0 to 9 and non-digits A to Z (upper + lower case) as well as the underscore _. Since _ is not a letter, this group can't be named letters: or similar. The only reason why two groups of symbols are required in the formal syntax is because an identifier cannot start with a digit, for historical reasons.

$ is not a valid symbol to use as part of an identifier in the C language. You might be confused by non-conforming compilers such as gcc allowing $ even in it's supposed conforming mode -pedantic. ISO C lists this as a "common extension" in Annex J.5.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    The term "nondigit" implies "everything which is non-digit". The `"` (for example) is non-digit too. However, the `"` is not present in the "nondigit". Hence, the confusion. – pmor Apr 28 '23 at 09:49
  • @pmor Semantically, it means whatever it is defined to mean :) – klutt Apr 28 '23 at 09:51
  • 2
    @pmor "Nondigit" rather implies: _everything which is non-digit but part of a valid identifier_. This syntax is part of the identifier syntax _identifier-nondigit:_. `"` is not a valid character for an identifier and so it isn't listed. – Lundin Apr 28 '23 at 10:08
  • From the GCC manual: "Some users try to use `-Wpedantic` to check programs for strict ISO C conformance. They soon find that it does not do quite what they want: it finds some non-ISO practices, but not all—only those for which ISO C *requires* a diagnostic, and some others for which diagnostics have been added." – Ian Abbott Apr 28 '23 at 10:08
  • 1
    @Lundin FYI: The `-or-` practice exists. A few examples: C: `struct-or-union`, C++: `expr-or-braced-init-list`, `brace-or-equal-initializer`, `class-or-decltype`. – pmor Apr 28 '23 at 10:14
  • 2
    @IanAbbott Yeah they wish... but the truth is that the option simply isn't reliable. For example in this answer I just wrote for another non-conformance issue: https://stackoverflow.com/questions/47410876/why-gcc-doesnt-generate-any-warnings-about-newline-at-end-of-file/76128460#76128460 – Lundin Apr 28 '23 at 11:15
  • 1
    In general, the attitude of gcc maintainers that the behavior of their various strange options have some sort of canonical significance isn't helpful for the purpose of producing a quality implementation. Programmers expect one option to enable all warnings. That option is not `-Wall`. Programmers expect one option to enable strict C conformance. That option is not `-pedantic`. Programmers expect one option to enable conformance according to a certain C standard. That option is not `-std=...`. In all cases the maintainers will come up with strange excuses "No, the sloppy gcc manual says-...". – Lundin Apr 28 '23 at 11:33
  • Good point about gcc sloppiness, although accepting `$` in identifiers does not require a diagnostic. – Ian Abbott Apr 28 '23 at 14:43
  • It was recognized, well before the Standard was written, that compilers whose target platforms have predefined linker symbols that contain symbols like `$`, and which C programmers may need to use within their programs, should allow the use of such characters within identifiers. The Standard allows conforming implementations to extend the language by accepting such characters within identifiers, provided they document such treatment. Such characters may not appear in *strictly* conforming programs, but if there exists a conforming implementation somewhere in the universe... – supercat Apr 28 '23 at 20:06
  • ...that would accept such characters as an extension, any source text which is accepted by such an implementation would, *by definition*, be a conforming C program. While people often pretend that the Standard has a conformance category that is broader than "strictly conforming C program", but narrower than "conforming C program", such a notion is contrary to what the Standard actually says. – supercat Apr 28 '23 at 20:07
  • @Lundin FYI: in Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile there is `FPType_Nonzero`, which is confusing because the term "non-zero" implies "everything which is non-zero". The `FPType_Infinity` (for example) is non-zero too. Perhaps Arm wanted to name it `FPType_Normal` instead of `FPType_Nonzero`. – pmor Jul 11 '23 at 13:53
2

Some points to express its appropriateness:

  1. Concise and Precise

    The term nondigit is a concise and precise way to refer to a specific set of characters that are not digits and can be used in naming identifiers. It is clear and unambiguous, and avoids any confusion that may arise from using a more general term like letter-or-underscore.
  2. Widely Used (can be considered)

    The term nondigit is widely used in the programming community, not just in C, but also in other programming languages like Java and C++. This makes it a familiar and recognizable term for programmers who are familiar with multiple languages.
  3. Specific to Identifiers

    The term nondigit is specifically used in the context of naming identifiers, which are an important aspect of the C language. Using a more general term like letter-or-underscore could potentially include characters that are not valid for use in identifiers, such as special characters or foreign letters with diacritical marks.
  4. Historical Context

    The term nondigit has been used in the C language since its earliest versions, and is likely a holdover from the language's predecessor, B. This historical context adds to the appropriateness of the term in the C language.

The use of the term in the B programming language, which influenced the development of C, suggests that it was chosen for historical reasons.

The B language had a similar syntax to C and used nondigit to refer to the set of characters that could be used in identifiers, as seen in this snippet of B code:

main( ) {
  auto a, b, c, sum;

  a = 1; b = 2; c = 3;
  sum = a+b+c;
  putnumb(sum);
}

(For additional information auto is used to define 36-bit variables in B)

It is likely that the term nondigit was carried over from B into C, as C was developed as an extension of B. The use of nondigit has persisted in subsequent versions of the C standard, including the current C18 standard.

Furthermore, the use of concise and unambiguous language is a common practice in programming language specifications. This can be seen in the use of terms like whitespace and newline to describe specific sets of characters, rather than more descriptive terms like blank space or line break. The use of nondigit in C follows this practice, providing a clear and specific definition of the set of characters that can be used in identifiers.

While it may seem more descriptive to use a term like letter-or-underscore to describe this set of characters, it is important to note that this could potentially lead to confusion or misinterpretation. For example, it is possible for a programming language to include other special characters in its set of valid identifier characters, such as accented letters or currency symbols. Using a more specific term like nondigit makes it clear that only the characters listed in the C specification are valid for use in identifiers.

Ultimately, the use of the term nondigit in C is a matter of convention and historical precedent. While it may not be the most descriptive term, it has been used in the language for many years and is well-established in the C specification.

Thankful to @PeteKirkham for pointing out mistakes and for additional resources.

Sources of information

  1. The Development of the C Language
  2. A Tutorial Introduction To The Language B
Snell
  • 89
  • 8
  • 2
    So, this question is tagged "language-lawyer". This means that a certain amount of rigor and support by specification citations is required. Your answer is short on that (even by your own admission). This is a FYI, in case of a chilly response to the post. – StoryTeller - Unslander Monica Apr 28 '23 at 09:38
  • 2
    "all characters that are not digits". That would also include all operators – Gerhardh Apr 28 '23 at 09:38
  • 1
    @Gerhardh hope so this makes my point even clear. I will add few more details to it – Snell Apr 28 '23 at 11:03
  • 1
    Most of this is either speculation, tautology or plain false (2 - was it widely used before C? 3 - whatever term chosen to go into the standard would be in the standard; 4- the C11 standard defines the characters, all the listed invalid chars are non-digits; 5-obviously it's not always clear as someone is asking about it; 6- the equivalent B language term is `alpha` which is not explicitly defined by allowed characters https://www.bell-labs.com/usr/dmr/www/kbman.html and presumably doesn't include underscore) – Pete Kirkham Apr 28 '23 at 11:20
  • @PeteKirkham 2- explains that thinking of that the term nondigit is confusing but it cannot be removed, 3-I accept its of no use here, 4 - I'm still reading that point, 5-express my own reason, 6 - the link I share clearly define the characters to be used in `4th` point and also uses the term `non-digit`. I'm still reviewing my answer and I'll complete it very soon with accurate details thanks for checking it. – Snell Apr 28 '23 at 11:32