0

What characters are valid to occur before and after a preprocessing directive in C according to the C standard.

/*what are all the valid characters that can occur here*/ #include <stdio.h> /*and here according to the C standard*/
main()
{

    printf("Hello World");      

}

now the C standard haven't mentioned what characters are valid to occur before and after a preprocessing directive if someone can guide me with the exact definition of the C standard it will be much appreciated

Kevin
  • 209
  • 2
  • 11
  • 1
    This sounds like homework. Which chapters of the standard did you read to answer the question? – the busybee Jun 29 '20 at 06:02
  • this isn't homework im self taught can you explain me exacty and clearly what this http://www.port70.net/~nsz/c/c11/n1570.html#6.10p2 means – Kevin Jun 29 '20 at 06:34
  • 1
    Which term is not clear to you? The paragraph is exact and clear. – the busybee Jun 29 '20 at 10:41
  • It states that **`#`** is the first token and it can be placed optionally after whitespace and the thing i find unclear is: **`containing at least one new-line character`** now they haven't stated where the **`new-line character is placed. is it placed after the directive or before the directive`**. they have only stated **`containing at least one new-line character`** but they haven't stated where the **`newline character is placed`** when it's present **`(after directive or before directive)`** – Kevin Jun 29 '20 at 11:40
  • 1
    Very similar to the linked dup from a few hours prior. – dbush Jun 29 '20 at 15:41

2 Answers2

2

Let's dissect the paragraph 2 of the C standard you linked in a comment:

A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints:

The characters that make up a source code is divided in tokens. Such tokens are for example special characters like '#', identifiers beginning with a letter or underscore, or numbers beginning with a decimal character.

The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character.

"White space" is any sequence of white space characters without any other character. Commonly used white space characters are space (or blank), horizontal tabulator, line feed or carriage return.

"[...] white space containing at least one new-line character" means that one new-line character exists in the sequence of white space characters before the '#'. It does not matter where in the sequence it is.

So these are all valid sequences, shown as C strings:

"\n\t\t\t#..."
"\n      #..."
"\n#..."
"\n\t#..."
"\n\t #..."
"\n \t#..."

The last token in the sequence is the first new- line character that follows the first token in the sequence.

Beginning with the token '#' all next tokens make up the preprocessor directive, until the next new-line character is found. The footnote 165 mentions the term "line" for such a sequence.

A new-line character ends the preprocessing directive even if it occurs within what would otherwise be an invocation of a function-like macro.

The invocation of a function-like macro looks like a function call in C, an identifier with a pair of parentheses. If there is a new-line before the closing parenthesis, the directive ands at that place.


EDIT:

White space characters are listed concretely in chapter 7.4.1.10 "The isspace function" of the standard you linked:

The standard white-space characters are the following: space (' '), form feed ('\f'), new-line ('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v').

One can assume that this function is used by the preprocessor.

Your confusion might come from interpretating "[...] white space containing no new-line characters [...]" as "white space does not include new-line in general" or "new-line is a special white space character." Neither is true.

The new-line is a valid white space character. It just has a special meaning under the specific circumstances of marking the beginning and the end of a preprocessor directive. And that is why they request white space without any new-line.

If the white space contained a new-line, it will mark the beginning of a new token sequence in the context of the preprocessor.

Please note that the preprocessor and the language C are quite separated concepts. You can use the preprocessor for preprocessing any other source files, using it for assembly is quite common. And you can write C source files without any preprocessor directive.

The preprocessor knows nothing about C, and the C compiler knows nothing about preprocessing directives.

the busybee
  • 10,755
  • 3
  • 13
  • 30
  • Hey thanks. so what if a new-line is only present before the `#` token without any whitespace before it this is not mentioned in the C standard look this: `\n#include ` is this mentioned anywhere in the C standard – Kevin Jun 30 '20 at 06:36
  • 1
    Where is the problem? `\n` is a white space character, too. If the source file starts with `"\n#...` it is a valid start sequence. – the busybee Jun 30 '20 at 09:32
  • so this means `new-line` is also a whitespace character – Kevin Jun 30 '20 at 10:36
  • 1
    Yes, sure. The line-feed character `'\n'` is the new-line character, hence its abbreviation. White space characters don't print anything on paper, that's why they are called "white space". Some coding systems use the combination of `"\n\r"` as a new-line marker, but this will be transparently decoded. – the busybee Jun 30 '20 at 10:46
  • So then can you explain me why did they mention about **new line characters** in the following text: `(optionally after white space containing no **new-line characters**) or that follows white space containing at least one **new-line character**.` if `whitespace` includes **space characters**, **tab characters** **new-line characters** why did they mention about **new line characters** specially in the above text it is not necessary to mention about `new line character` in the above text if whitespace means all non printing characters including `new line` – Kevin Jun 30 '20 at 11:24
  • 1
    @Kevin It's a fancy way of saying "only spaces and tabs on a line before `#`". – dbush Jun 30 '20 at 12:01
  • so whitespace includes `line terminators (new-line characters)` right – Kevin Jun 30 '20 at 12:26
2

Note that before the preprocessor directives are examined, the compiler has already passed through phases 1 through 3 of the translation process. Translation phase 2 combines lines ending with a backslash with the following physical line in order to create logical lines. Translation phase 3 replaces every comment with a single space character. (It is allowed but not required that phase 3 also replaces every consecutive sequence of whitespace characters other than newline with a single space character.)

Once that is done, Phase 4 is entered, at which point preprocessor directives are identified. According to §6.10 paragraph 2 of the standard, a sequence of tokens is a preprocessor directive only if "The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character."

That's a very verbose way of saying that the # has to be the first token on a line, which means that it can only be preceded on the line by whitespace. But as the parenthetic comment in the sentence I quoted says, the test applies to the program as seen after phases 1 to 3, which means that comments have already been replaced with whitespace. So in the original program, the # might be preceded by a comment or by whitespace. (The comment must be a /*…*/ comment, since //… comments extend to the end of the line. Also note that continuation lines are combined before comments are identified, so a continuation marker can occur inside a //… comment.)

As to what can follow a preprocessor directive on the line, the answer is technically "nothing", since the directive extends up to and including the newline. (Again, a comment may have appeared in the original program.) The standard shows a grammar for each preprocessor directive which indicates what the directive's syntax is. If you were to add a non-whitespace character to a preprocessing directive, that would either create a syntax error or alter the meaning of the directive.

§6.10 paragraph 5 requires that whitespace within a preprocessor directive can only be a space or tab character, so that vertical tab and form-feed characters would be illegal. However, it is possible that the implementation has changed those characters to space characters in translation phase 3, so the use of vertical tab and form-feed in a preprocessor directive is implementation-dependent. Portable programs should only contain vertical tab and form-feed characters at the beginning of a line.

rici
  • 234,347
  • 28
  • 237
  • 341