Meaning of character in C's streams

Question

I seem to have a blind spot in my understand of the meaning of character in C's stream abstraction; I just can't seem to stitch the picture together.
What is the meaning of character with respect to binary streams?

From 7.19.7.1p2 ...

If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

...

Suppose I wrote a file on machine where characters require 16 bits and I start reading on a machine on which the characters fit in 7 bits. Then what am I actually reading with each call to fgetc? Is it part of the 16 bit character (i.e., I'm reading 7 bits at a time) or is the 16-bit character "squezzed" into a 7 bit representation with information loss?

I'm not sure if you're asking about actual characters in the sense of character encodings, or if you're talking about the implications of having different size `char` in different C implementations. — Cubic, Feb 02 '19 at 18:30
Tbh, I'm not sure either. Which definiton is even implied here. The C spec gives a very abstract meaning to character. I'm not refering to type char here though. — zagortenay333, Feb 02 '19 at 18:31
Edit, I didn't read your comment correctly. I do mean the implications of different size char. — zagortenay333, Feb 02 '19 at 18:36

Carl Norum · Accepted Answer · 2019-02-02T18:36:42.883

1

From the spec:

3.7.1
1 character
single-byte character
〈C〉 bit representation that fits in a byte

and:

3.6
1 byte
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment
NOTE 1 It is possible to express the address of each individual byte of an object uniquely.
NOTE 2 A byte is composed of a contiguous sequence of bits, the number of which is implementation- defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit.

So on your writing machine, char is likely a 16-bit type. On your reading machine, char is likely an 8-bit type. C requires that char be at least an 8-bit type:

5.2.4.2.1 Sizes of integer types
...
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

So on your reading machine, you'll need make two fgetc calls to read each half of the 16-bit characters you wrote on the original machine.

edited Feb 02 '19 at 18:36

answered Feb 02 '19 at 18:27

Carl Norum

219,201
40
422
469

1

But I'm reading on a machine with 7-bit characters a file created on a machine with 16-bit characters. – zagortenay333 Feb 02 '19 at 18:30
Ah, and what would happen if I flipped that and wrote on an 8-bit char machine and read on a 16-bit char machine? Do the wider chars get padded? – zagortenay333 Feb 02 '19 at 18:39
No - you'd get two of the 8-bit chars packed into the 16-bit value you're getting back from `fgetc`. Each machine only operates within the parameters of its own implementation - there's nothing in the file itself that changes the behaviour upon reading it. – Carl Norum Feb 02 '19 at 18:39
This is, for example, why networking RFCs refer to "octets" instead of "bytes." It's clear then that the protocols operate on 8-bit entities and it's your responsibility as a programmer to deal with that however your system needs to. – Carl Norum Feb 02 '19 at 18:43
1

Final sentence is questionable, at best. A lot depends on what mechanism moved the file from a 7-bit to a 16-bit machine (or back). – Ben Voigt Feb 03 '19 at 01:03
Note that even if you're working on a machine with 16 bit "characters", it is likely that the C compiler still defines characters (`char`) as 8 bits, and you need `wchar_t` (and the various `w` stdio functions) to get 16-bit characters. – Chris Dodd Feb 05 '19 at 18:03
@BenVoigt Especially as a 7-bit machine is strictly outside the purview of C. – Deduplicator Feb 14 '19 at 19:33

score 0 · Answer 2 · edited Feb 14 '19 at 18:59

Technically, char is a one byte type that can hold values from -128 to 127; depending on the architecture it can also be unsigned, holding values from 0 to 255. But although it is, strictly speaking, an integer type, it is not used to hold integers generally. You will almost always use type int or one of its many variations for that.

Type char, in practice, has a couple of dedicated uses:

It can hold an ASCII value. As there are 128 ASCII codes, or 255 ASCII codes in some extended versions, char is an ideal type for this purpose. But when it is used this way, it nearly always appears in a program as part of a string, which (in C, although not always in C++) is a simple array of char.

If you are designing a structure to be compact, and you want to create a field (that is, a data member) that will never hold more than 256 different values, you might want to use char type for that purpose as well.

Note that there is a subtle point here not always obvious to new C programmers. You can assign ASCII codes to char variables, but that is not really a property of char in C. For example, I could assign ASCII code numbers to any integer field. The C language itself does not prevent this. But remember that C string library functions are designed to be used with arrays of char, not arrays of int.

Does this address the question that the OP has asked? How does this relate to reading files written on one machine with a different number of bits per character? — templatetypedef, Feb 03 '19 at 00:46

score -2 · Answer 3 · edited Feb 07 '19 at 03:51

char* is how you declare a pointer to a char variable. It’s useful when you want a string with unknown length.

1st example:

char name[10];
strcpy (name, "type_your_name_here"); //overwrites the first argument with the second.

Here you’re reserving 10 pieces of memory. You might use them all or your name might just be “Jack” which, if we account for the '\0' special character that comes at the end of every string, takes only 5 memory chunks. That means you have 5 remaining pieces that you’re not using.

Maybe your name is longer that 10 characters, where will you store the extra ones then? You won’t be able to. Because you gave a static declaration to your array of chars.

2nd example:

char *name;

This means that you just declared the pointer variable where you’ll store the address of the first character in your string. It gives more freedom and flexibility to your usage. Whether your name is long or short, the predefined string functions like strcpy and strcat can handle memory allocation for you.

In short:

My understanding is that, in the first example you defined both the starting and end points of your string, which limits what you can fit in there and also can waste memory space. In the second example, you only specified the starting point which grants more usage freedom and memory economy. I don’t know of any drawbacks to the second example, it’s just my first year learning this as well. So may be the experts can shed a brighter light on this matter than I can.

Aside from the formatting problems (which I've tried to fix up for you), I'm not really sure this answer addresses the question in any way - can you try to explain some more about how this applies? — Carl Norum, Feb 03 '19 at 00:03
Also - neither `strcpy` nor `strcat` *ever* allocate any memory for you. — Carl Norum, Feb 03 '19 at 00:05

Meaning of character in C's streams

3 Answers3