2

Assume that you're writing (portable) C99 code in the invariant set of ISO 646. This means that the \ (backslash, reverse solidus, however you name it) can't be written directly. For instance, one could opt to write a Hello World program as such:

%:include <stdio.h>
%:include <stdlib.h>

int main()
<%
    fputs("Hello World!??/n", stdout);
    return EXIT_SUCCESS;
%>

However, besides digraphs, I used the ??/ trigraph to write the \ character.

Given my assumptions above, is it possible to either

  1. include the '\n' character (which is translated to a newline in <stdio.h> functions) in a string without the use of trigraphs, or
  2. write a newline to a FILE * without using the '\n' character?
  • I'm not sure I understand. Your character set has no backslash but you don't want to write a backslash using its trigraph? – zneak Oct 26 '15 at 15:39
  • 1
    @zneak: This is not possible, as the standard requires those characters to exist. (Trigraphs are the most superfluous feature, at least since ~20 years). – too honest for this site Oct 26 '15 at 15:46
  • @zneak The only place where I would have to use trigraphs (which are very different from digraphs, because they're translated in a different stage) is to write the only backslashes I'd ever need, namely for newlines. It feels kind of wrong. –  Oct 26 '15 at 15:46
  • Escape sequences are **not** "translated" by a library, but by the compiler. Read http://port70.net/~nsz/c/c11/n1570.html#5.2.1 – too honest for this site Oct 26 '15 at 15:47
  • @Olaf, I'm not sure he ever mentioned libraries. – zneak Oct 26 '15 at 15:48
  • @Olaf Whatever `\n` translates to (in the program binary) is translated by `` functions to a newline (in I/O streams). –  Oct 26 '15 at 15:48
  • @zneak: "... which is translated to a newline in functions" – too honest for this site Oct 26 '15 at 15:49
  • @Olaf I interpreted that as "the digit 10 causes a line break" rather than "the character sequence `\n`, digitally represented as '92 110', causes a line break". – zneak Oct 26 '15 at 15:51
  • @zneak: Where did you get this citation from? And: the escape sequence is interpreted by the compiler. Not sure whatr OPs actual problem ist. If there is some meta-compiler involved, he should escape the backslash, possibly multiple times. Sounds like XY-problem to me. And avoiding trigraphs just beause "they are ugly" is nonsense. – too honest for this site Oct 26 '15 at 15:58
  • @Olaf which citation? I'm posting my own interpretation. – zneak Oct 26 '15 at 15:59
  • @zneak: A sentence in quotations marks is commonly a citation, that's what confused me. Anyway, `10` is not a digit and I do not read this from OPs text. – too honest for this site Oct 26 '15 at 16:01
  • @Olaf, please excuse this ESL for the incorrect word. – zneak Oct 26 '15 at 16:03
  • @Olaf If I can avoid trigraphs, I want to avoid trigraphs. Everything else are fixed design constraints to ensure portability.aee –  Oct 26 '15 at 16:07
  • You can't have your cake and eat it, too. What do you want actually? I have the impression you do not really know yourself. – too honest for this site Oct 26 '15 at 16:10
  • @Olaf Fair enough, I should've been much more honest with my intentions. I was considering the topic of why IBM et al. opposed to the removal of trigraphs. This was one of the cases that struck me as odd: how would you handle newlines etc. if you'd have no backslash? Thanks for the reference by the way (I finally realised what you were pointing at) to the requirement that the charset must include a backslash. However, if that requirement were deemed sufficient to ensure the portability of source code character encoding, then I'd think digraphs, nor trigraphs, nor `` would exist. –  Oct 26 '15 at 16:55
  • @Rhymoid: Old Apple-II had no lowercase letters. Actually ASCII-characters 0x60+ were missing to safe character-table ROM in the display controller. But times are achanging, and now we do have full ASCII. It's similar with trigrahps. These are a legacy. IBM et al. have quite some legacy code, thus refuse to remove them. And they do no actual harm. For new projects, or if you have to encapsulate C into another source code, you have to escape the backslash, thus you might have something like `"\\\\"` to finally have the backslash itself in your compiled program. Still bettern than trigraphs. – too honest for this site Oct 26 '15 at 17:42

5 Answers5

4

For stdout you could just use puts("") to output a newline. Or indeed replace the fputs in your original program with puts and delete the \n.

If you want to get the newline character into a variable so you can do other things with it, I know another standard function that gives you one for free:

int gimme_a_newline(void)
{
  time_t t = time(0);
  return strchr(ctime(&t), 0)[-1];
}

You could then say

fprintf(stderr, "Hello, world!%c", gimme_a_newline());

(I hope all of the characters I used are ISO646 or digraph-accessible. I found it surprisingly difficult to get a simple list of which ASCII characters are not in ISO646. Wikipedia has a color-coded table with not nearly enough contrast between colors for me to tell what's what.)

  • 2
    Wow :) Just saying if OP wants to avoid `??/` because it just doesn't "feel right" ... how does *this* hack feel? –  Oct 26 '15 at 16:26
  • lol, sure does ;) ... and you could always `dup2()` your fd to `STDOUT_FILENO`, so `puts()` will do for any file :) –  Oct 26 '15 at 16:29
  • 1
    Also this is the only time in the history of C that `ctime`'s output format solved a problem instead of creating one. Now hoping for a followup question where `gets` is the correct answer. –  Oct 26 '15 at 16:35
  • I can't not upvote such a clever hack, even if it feels so, so wrong. –  Oct 26 '15 at 16:46
  • @Rhymoid OTOH I up-voted both the odd - yet thought provoking question and well as this clever thought provoking answer - silly or not. – chux - Reinstate Monica Oct 26 '15 at 16:59
3

Your premise:

Assume that you're writing (portable) C99 code in the invariant set of ISO 646. This means that the \ (backslash, reverse solidus, however you name it) can't be written directly.

is questionable. C99 defines "source" and "execution" character sets, and requires that both include representations of the backslash character (C99 5.2.1). The only reason I can imagine for an effort such as you describe would be to try to produce source code that does not require character set transcoding upon movement among machines. In that case, however, the choice of ISO 646 as a common baseline is odd. You're more likely to run into an EBCDIC machine than one that uses an ISO 646 variant that is not coincident with the ISO-8859 family of character sets. (And if you can assume ISO 8859, then backslash does not present a problem.)

Nevertheless, if you insist on writing C source code without using a literal backslash character, then the trigraph for that character is the way to do so. That's what trigraphs were invented for. In character constants and string literals, you cannot portably substitute anything else for \n or its trigraph equivalent, ??/n, because it is implementation-dependent how that code is mapped. In particular, it is not safe to assume that it maps to a line-feed character (which, however, is included among the invariant characters of ISO 646).

Update:

You ask specifically whether it is possible to

include the '\n' character (which is translated to a newline in functions) in a string without the use of trigraphs, or

No, it is not possible, because there is no one '\n' character. Moreover, there seems to be a bit of a misconception here: \n in a character or string literal represents one character in the execution character set. The compiler is therefore responsible for that transformation, not the stdio functions. The stdio functions' responsibility is to handle that character on output by writing a character or character sequence intended to produce the specified effect ("[m]oves the active position to the initial position of the next line").

You also ask whether it is possible to

write a newline to a FILE * without using the '\n' character?

This one depends on exactly what you mean. If you want to write a character whose code in the execution character set you know, then you can write a numeric constant having that numeric value. In particular, if you want to write the character with encoded value 0xa (in the execution character set) then you can do so. For example, you could

fputc(0xa, my_file);

but that does not necessarily produce a result equivalent to

fputc('\n', my_file);
John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • "The stdio functions' responsibility is to handle that character on output by producing the specified effect" not even that. These functions will just write the character to the output stream. It's terminals that actually create the visible effect. –  Oct 26 '15 at 16:46
  • @FelixPalmen I'd have to look, but I'd wager any interaction between the standard library and the terminal (and the existence of the latter) is entirely implementation-defined. –  Oct 26 '15 at 16:48
  • @FelixPalmen, given that "The external representations in a text file need not be identical to the internal representations, and are outside the scope of [the C99 standard]", I'd have to say that the stdio functions indeed are responsible for a translation. Of course, that is typically an identity transformation. What the stdio functions are *not* responsible for is trying to dynamically match that transformation with any particular output device. I have updated my answer. – John Bollinger Oct 26 '15 at 16:54
  • @Rhymoid http://port70.net/~nsz/c/c99/n1256.html#5.2.2p3 -- I read this the way the *display device* (typically a terminal) should handle these 1 char representations. –  Oct 26 '15 at 16:55
  • 1
    If you're just looking to avoid the weird looking `"??/n"`, you can probably do something like `#define NL "??/n"` and then use it like `"hello world" NL`, since C concatenates string literals with nothing in between. – zneak Oct 26 '15 at 16:57
  • @JohnBollinger yes, a conforming C implementation *should* produce output that *will* have the desired effect on the display device of the output platform. Actually interesting topic, I did some `stdio` implementation for DOS and VGA console recently and actually handled `\n` specially -- does this make it non-conforming? I guess so ... –  Oct 26 '15 at 16:58
  • 1
    @FelixPalmen Huh. I'd read that differently: the internal representation is what the standard library works with (a single character code), while the external representation is what you actually send to the storage system or the display device. DOS's translation between `"\n"` (text mode streams) and `"\x0D\x0A"` (binary mode streams) comes to mind. –  Oct 26 '15 at 17:09
  • You do not even have to ressort to ISO-Latin-XX (btw. one should use 15 instead of 1 now, as that provides currency symbols). Plain old ASCII is sufficient. Any code above 0x7FU is implementation defined as of the standard anyway. Another advantage is the compatibility with UTF-8 for the code itself (except for string literals and character constants. However, internationalisation and character sets are another critical topic not only in C anyway. – too honest for this site Oct 26 '15 at 17:47
  • @Rhymoid: It's text mode streams. Binary streams are not translated actually. – too honest for this site Oct 26 '15 at 17:52
  • @Rhymoid yes, you're right, but what is important is that there *is* an external representation that's the same for just storing in a file. So it's inevitable output devices (like terminals) must know how to handle this representation and perform the corresponding actions :) –  Oct 26 '15 at 18:20
1

Short answer is, yes, for what you want to do, you have to use this trigraph.

Even if there was a digraph for \, it would be useless inside a string literal because digraphs must be tokens, they are recognized by the tokenizer, while trigraphs are pre-processed and so still work inside string literals and the like.

Still wondering why somebody would encode source this way today ... :o

1
  1. No. \n (or its trigraph equivalent) is the portable representation of a newline character.
  2. No. You'd have to represent the literal newline somehow, and \n (or it's trigraph equivalent) is the only portable representation.

It's very unusual to find C source code that uses trigraphs or digraphs! Some compilers (e.g. GNU gcc) require command-line options to enable the use of trigraphs and assume they have been used unintentionally and issues a warning if it encounters them in the source code.

EDIT: I forgot about puts(""). That's a sneaky way to do it, but only works for stdout.

Ian Abbott
  • 15,083
  • 19
  • 33
0

Yes of course it's possible

fputc(0x0A, file);
Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • Although my *source files* may be ISO 646 (at least, when I publish them), I doubt the C99 standard guarantees that the I/O streams on the target architecture use the same encoding, or that a function like `fgets` interprets `<0A>` as a newline. –  Oct 26 '15 at 15:51
  • I still don't understand why that would be used. Normally you do not use backslashes outside if string literals or character constants. Even a trigraph sequence would be better. – too honest for this site Oct 26 '15 at 15:53
  • 2
    @Olaf Section 5.2.2 clause 3 says that `\n` has an implementation-defined value, meaning that I can't just assume that it's `0x0A`. –  Oct 26 '15 at 15:56
  • @Rhymoid: I did not refer to the macic nomber. Read my first comment! – too honest for this site Oct 26 '15 at 16:11