Char - ASCII relation

Question

A char in the C programming language is a fixed-size byte entity designed specifically to be large enough to store a character value from an encoding such as ASCII.

But to what extent are the integer values relating to ASCII encoding interchangeable with the char characters? Is there any way to refer to 'A' as 65 (decimal)?

getchar() returns an integer - presumably this relates directly to such values? Also, if I am not mistaken, it is possible in certain contexts to increment chars ... such that (roughly speaking) '?'+1 == '@'.

Or is such encoding not guaranteed to be ASCII? Does it depend entirely upon the particular environment? Is such manipulation of chars impractical or impossible in C?

Edit: Relevant: C comparison char and int

score 6 · Answer 1 · answered Dec 11 '12 at 19:51

6

I am answering just the question about incrementing characters, since the other issues are addressed in other answers.

The C standard guarantees that '0' to '9' are consecutive, so you can increment a digit character (except '9') and get the next digit character, or do other arithmetic with them (C 1999 5.2.1 3).

The relationships between other characters are not guaranteed by the C standard, so you would need documentation from your specific C implementation (primarily the compiler) regarding this.

answered Dec 11 '12 at 19:51

Eric Postpischil

195,579
13
168
312

@Esailija So [truth] doesn't make sense to you because [falsehood]?? Eric cited the section of the C standard that provides this guarantee ... go read it if you doubt it. – Jim Balter Dec 11 '12 at 20:03
@JimBalter I wasn't referring to the answer with that but to the standard, how can they guarantee that when they don't guarantee the encoding. And if they guarantee `'0'-'9'`, why cannot they guarantee `'A'-'Z'`? But this will probably cause an extended discussion which is not allowed in SO... – Esailija Dec 11 '12 at 20:06
5

@Esailija 'how can they guarantee that when they don't guarantee the encoding.' -- By doing so. Your question makes no sense and shows a fundamental failure to understand logic and sets. "why cannot they guarantee 'A'-'Z'? " -- They CAN, but they DON'T, because EBCDIC violates it. – Jim Balter Dec 11 '12 at 20:09
@Esailija Perhaps the problem is poor English ... if you mean "why did they" or "why didn't they", write that instead of "how can they" or "why cannot they", which mean something very different. – Jim Balter Dec 11 '12 at 20:12
@JimBalter ah I see, didn't realize there was an encoding where a-z are not consecutive. This answers my question, so thanks. – Esailija Dec 11 '12 at 20:13
2

@Esailija Right. Language standards have become more a matter of language invention, but the C standard to a large degree actually *standardized* existing practices at the time. Since both ASCII and EBCDIC were in use, the standards committee provided the guarantees they could that were met by both. – Jim Balter Dec 11 '12 at 20:17
@Esailija And of course one could ask what sort of idiot would invent a character encoding in which a-z aren't consecutive, but we would have to go back to the history of punch cards. – Jim Balter Dec 11 '12 at 20:24

score 4 · Accepted Answer · 2012-12-11T21:54:54.827

But to what extent are the integer values relating to ASCII encoding interchangeable with the char characters? Is there any way to refer to 'A' as 65 (decimal)?

In fact, you can't do anything else. char is just an integral type, and if you write

char ch = 'A';

then (assuming ASCII), ch will merely hold the integer value 65 - presenting it to the user is a different problem.

Or is such encoding not guaranteed to be ASCII?

No, it isn't. C doesn't rely on any specific character encoding.

Does it depend entirely upon the particular environment?

Yes, pretty much.

Is such manipulation of chars impractical or impossible in C?

No, you just have to be careful and know the standard quite well - then you'll be safe.

Jim Balter · Answer 3 · 2012-12-11T19:56:32.417

character literals like 'A' have type int .. they are completely interchangeable with their integer value. However, that integer value is not mandated by the C standard; it might be ASCII (and is for the vast majority of common implementations) but need not be; it is implementation defined. The mapping of integer values for characters does have one guarantee given by the Standard: the values of the decimal digits are continguous. (i.e., '1' - '0' == 1, ... '9' - '0' == 9).

score 1 · Answer 4 · answered Dec 11 '12 at 20:39

Where the source code has 'A', the compiled object will just have the byte value instead. That's why it is allowed to do arithmetic with bytes (the type of 'A' is char, i.e. byte).

Of course, a character encoding (more accurately, a code page) must be applied to get that byte value, and that codepage would serve as the "native" encoding of the compiler for hard-coded strings and char values.

Loosely, you could think of char and string literals in C source as essentially being macros. On an ASCII system the "macro" 'A' would resolve to (char) 65, and on an EBCDIC system to (char) 193. Similarly, C strings compile down to zero-terminated arrays of chars (bytes). This logic affects the symbol table also, since the symbols are taken from the source in its native encoding.

So no, ASCII is not the only possibility for the encoding of literals in source code. But due to the restriction of single-quoted characters being chars, there is a guarantee that UTF-16 or other multi-byte encodings are excluded.

UTF-16 wouldn't be excluded, as all characters guaranteed by ISO C would be representable as one (16-bit) byte in it. — Remember Monica, Jun 24 '21 at 09:10
Yes, a 16-bit byte. Bytes are not limited to 8-bit, you maybe confuse this with octets. — Remember Monica, Jun 28 '21 at 15:29

Char - ASCII relation

4 Answers4

Linked