5

Note: The suggested duplicate deals with unsigned int and signed int, not unsigned char and signed char. The suggested duplicate question deals with C11. This question is concerned with C89 only. Can this question be reopened?

My code:

#include <stdio.h>

int main()
{
    signed char c;
    unsigned char d;

    c = (signed char) -2;
    d = (unsigned char) c;
    printf("%d %d\n", c, d);

    d = (unsigned char) 254;
    c = (signed char) d;
    printf("%d %d\n", c, d);

    return 0;
}

Output:

$ clang -Wall -Wextra -pedantic -std=c89 foo.c && ./a.out
-2 254
-2 254

Is the output guaranteed to be -2 254 in a standard-conforming C89 compiler for both conversions shown above? Or is the output dependent on the implementation?

Lone Learner
  • 18,088
  • 20
  • 102
  • 200
  • 1
    It's only well defined if the `unsigned` value is within the positive range of the `signed`. So if `char` is 8 bits, it's well defined if the value is between 0 and 127. – Barmar Jun 26 '23 at 23:30
  • @Barmar Thanks! What about the 2nd part of the question? Is conversion from `signed char` to `unsigned char` well defined? – Lone Learner Jun 26 '23 at 23:31
  • Yes, it's done using modular arithmetic. – Barmar Jun 26 '23 at 23:32
  • Both are answered at the duplicate question I linked to. – Barmar Jun 26 '23 at 23:32
  • @Barmar The duplicate question you have linked to seems to deal with `unsigned int` and `signed int`. It is not easily apparent that the same rules apply to `unsigned char` and `signed char`. Can this question be reopened so that the case of `signed char` and `unsigned char` can be answered independently. I think the answers are going to be useful because currently there is no question that deals with particularly with the conversions between `unsigned char` and `signed char`. – Lone Learner Jun 26 '23 at 23:35
  • an answer "answering half the question" suggests that two separate questions should have been posted – M.M Jun 26 '23 at 23:35
  • @M.M In general I agree but wouldn't that be too much overhead for a question like this where both conversions often occur in the same piece of code and the concepts to answer both may be tightly linked to each other? – Lone Learner Jun 26 '23 at 23:37
  • I agree the currently duplicate isn't great, but "both halves" have been asked and answered dozens of times, so it'd be possible to find better duplicates rather than opening and answering yet again. – M.M Jun 26 '23 at 23:39
  • 2
    The `printf` could be problematic on systems with UCHAR_MAX > INT_MAX although that's a whole nother post – M.M Jun 26 '23 at 23:40
  • @LoneLearner The accepted answer clearly applies to all integer types. There's nothing special about char. – Barmar Jun 26 '23 at 23:41
  • @Barmar That answer is quoting the C11 standard. My question is about C89. Are those rules still the same? Are you sure about that? – Lone Learner Jun 26 '23 at 23:44
  • @Barmar well there is something special about char: `signed char` and `char` are distinct types; and integer promotion rules apply to them. Although as it turns out those things don't make a difference in this question – M.M Jun 26 '23 at 23:45
  • They might have changed the wording, I don't think they changed the intent in any C revision. – Barmar Jun 26 '23 at 23:46
  • @M.M I did try searching SO for questions about `signed char` to `unsigned char` and vice versa for C89. No results came up for me. May I request that this question be either reopened or be closed with an appropriate duplicate question that has answers that are specific to the question I asked? – Lone Learner Jun 26 '23 at 23:46
  • @Lone Learner, "Is the output guaranteed to be -2 254 in a standard-conforming C89 compiler for both conversions shown above? " --> No. (C89 converting to `signed char`) If the value cannot be represented the result is implementation defined. C89 6.2.1.2 – chux - Reinstate Monica Jun 27 '23 at 02:17
  • @chux: I believe you mean 3.2.1.2 instead of 6.2.1.2. – Andreas Wenzel Jun 27 '23 at 02:17
  • @AndreasWenzel My ref is from ANSI/ISO 9899-1990 (revision and redesignation of ANSI X3.159-1989) Approved Aug 3, 1992. What is yours? – chux - Reinstate Monica Jun 27 '23 at 02:19
  • @chux: [This](https://port70.net/~nsz/c/c89/c89-draft.html#3.2.1.2) is the link that I am using. I got that link from cppreference.com. – Andreas Wenzel Jun 27 '23 at 02:21
  • @AndreasWenzel Appears to be a draft. IAC, looks like various versions shifted the refs. So C89 (Conversions) 6.2.1.2 or C89 3.2.1.2 . – chux - Reinstate Monica Jun 27 '23 at 02:26
  • 1
    @Lone Learner, Why concerned about C89 in 2023? – chux - Reinstate Monica Jun 27 '23 at 02:33
  • 4
    C never said that CHAR_BIT == 8. On platforms that has wider `char` then obviously -2 won't be 254 even if all the rules are defined – phuclv Jun 27 '23 at 02:36
  • @chux-ReinstateMonica Because the project I am working on is a C89 project. – Lone Learner Jun 27 '23 at 07:48
  • @LoneLearner Curious that the project has not updated from C89 since the next major release C99, or 24 years ago. FWIW, it is due to projects like that that I encountered algorithms that depended on non- 2's complement. Gotta love those -0. – chux - Reinstate Monica Jun 27 '23 at 16:24

3 Answers3

4

Is converting from unsigned char to signed char and vice versa in C89 well defined?

Conversions to unsigned types is well defined. To signed types has implementation details.

Is the output guaranteed to be -2 254 in a standard-conforming C89 compiler for both conversions shown above?

No.

Or is the output dependent on the implementation?

Yes.


Not all implementations use 8-bit char and conversions to signed types incur implementation details.

Spec details: C89 Conversions. This wording differs from recent C specs. I have not found a significant difference.


When UCHAR_MAX <= INT_MAX, code could use below and let the compiler emit optimized, well defined code.

c = (signed char) (d > SCHAR_MAX ? d - UCHAR_MAX - 1 : d);

Likely needs some more thought to cover all cases.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 1
    FYI, conversion of integer types to unsigned integer types is indeed well defined (except the widths of the destination types may be implementation-defined). However, conversion from floating-point to unsigned integer types is undefined for out-range values. – Eric Postpischil Jun 27 '23 at 03:28
-2

The authors of the Standard almost certainly expected that an implementation would implement conversions between signed and unsigned character types in such a manner that round-trip conversions between them would be value-preserving on any implementation which did not have a compelling reason for handling them in some other fashion, and almost certainly expected that such implementations, if they existed at all, would be quite rare. There was thus no need for the Committee to worry about whether an implementations that had a good reason for processing such conversions in an unusual manner should be required to process them in value-preserving fashion anyhow. If no implementations would actually have a good reason to deviate from the common behavior, nobody should care whether the Standard mandates the commonplace treatment, and if an implementation did have a good reason to deviate, people working with it would be better placed than the Committee to judge the pros and cons of such a deviation.

supercat
  • 77,689
  • 9
  • 166
  • 211
-3

If I say anything wrong, please correct me.

Your problem have a flag with "undefined-behavior". I think it's not right.

If you have any doubts about the program, I suggest looking at the disassembly code of the program. All your confusion may be easily resolved by examining it.

The output:

-2 254
-2 254

It is right and it's centain behavior. This behavior is determined by the C language itself or the C language standard.

The key to outputting depends on how the programmer wants to interpret the stored value of FE.If you see FF as a unsigned char, it's 255(or FFFF as a unsigned short it's 65535 or FFFFFFFF as a unsigned int it's 4294967295). And see FF as a signed char, it's -1(or FFFF as a signed short it's -1 or FFFFFFFF as a signed int it's -1).

The same as you see FE as a unsigned char, it's 254. And see FE as a signed char, it's -2. And so on ......

When you ask a computer to store -2 and 254, the computer doesn't recognize positive or negative numbers, it only recognizes 0(In circuitry, it could perhaps be said to be "disconnected" or "broken.") and 1(In circuitry, it could perhaps be said to be "closed" or "connected."). If you ask the computer to store -2, it will store FE(Because of variable c and variable d is type of char,it occupy 1 byte) somewhere in memory(As @David C. Rankin point out that on computers that encode negative signed values in two-compliment.). Similarly, if you ask it to store 254, it will also store FE somewhere in memory.

See below code:

#include <stdio.h>

int main()
{
    signed char c;
    unsigned char d;

    c = (signed char) 0xFE;
    d = (unsigned char) c;
    printf("%d %d\n", c, d);

    d = (unsigned char)0xFE;
    c = (signed char) d;
    printf("%d %d\n", c, d);

    return 0;
}

Run it with below command:

clang -Wall -Wextra -pedantic -std=c89 foo.c && ./a.out

will output:

-2 254
-2 254

Why output double -2 254?

There is no -2 and 254 in the code.

It seems that only the number 0xFF was observed.

c = (signed char) 0xFE;

d = (unsigned char)0xFE;

So where does -2 and 254 come from?

Simple explanation: (Below have a more detailed explanation)

enter image description here

We find thatvariable c and variable d is char type, but %d is output int(or signed int) , how should compiler proceed now? The answer is signed extension and unsigned extension .

So now the value 0xFE stored in variable c has been transformed to 0xFFFFFFFE through an sign extension, and the value 0xFE stored in variable d has been transformed to 0x000000FE through an zero extension. When 0xFFFFFFFE printed is -2 with %d, and 0x000000FE printed is 254 with %d.(Are you not quite familiar with or don't quite understand 0xFFFFFFFE? Let's continue reading, as there's an explanation below.)

Or code like below:

#include <stdio.h>

int main()
{
    signed char c;
    unsigned char d;

    c = (signed char) 254;
    d = (unsigned char) c;
    printf("%d %d\n", c, d);

    d = (unsigned char)254;
    c = (signed char) d;
    printf("%d %d\n", c, d);

    return 0;
}

Run it with below command:

clang -Wall -Wextra -pedantic -std=c89 foo.c && ./a.out

will output:

-2 254
-2 254

In order to better explain your confusion, please take a look at the following code.

#include <stdio.h>

int main()
{
    signed char c;
    unsigned char d;

    c = (signed char) -2;
    d = (unsigned char) c;
    printf("%d %d %u %u\n", c, d, c, d);

    d = (unsigned char) 254;
    c = (signed char) d;
    printf("%d %d %u %u\n", c, d, c, d);

    return 0;
}

Run it with below command:

clang -Wall -Wextra -pedantic -std=c89 foo.c && ./a.out

will output:

-2 254 4294967294 254
-2 254 4294967294 254

Or run it with below command:

gcc -g -o foo foo.c && ./foo

will output:

-2 254 4294967294 254
-2 254 4294967294 254

Output is right.

More details explanation:

enter image description here

We find that variable c or variable d is char type, but %u is output unsigned int , how should compiler proceed now? The answer is signed extension and unsigned extension .

When we examine the disassembly code, we do indeed discover sign extension and zero extension. See below picture:

enter image description here

The other picture:

enter image description here

We found that use char type(BYTE) when assign value to variable c and variable d, but at printf the value of variable c and variable d before, there are some instruction like:

movzx  esi,BYTE PTR [rbp-0x1]
movsx  ecx,BYTE PTR [rbp-0x2]
movzx  edx,BYTE PTR [rbp-0x1]
movsx  eax,BYTE PTR [rbp-0x2]

movzx is zero extension, and movsx is sign extension. Like esi,ecx,edx,eax is equal to int(ecx occupy 4 byte, the type of int also occupy 4 byte).

So now the value 0xFE stored in variable c has been transformed to 0xFFFFFFFE(saved in ecx or eax) through an sign extension, and the value 0xFE stored in variable d has been transformed to 0x000000FE(saved in esi or edx) through an zero extension. When 0xFFFFFFFE printed is 4294967294 with %u, 0xFFFFFFFE printed is -2 with %d , and 0x000000FE printed is 254 with %u, 0x000000FE printed is 254 with %d.

The representation of 4294967294 see below picture.

enter image description here

The representation of -2 see below picture.

enter image description here

So now you see that when outputting the value of variable c or variable d, using %d and %u to print them out will yield different results. However, both representations refer to the same value stored in memory. The key point is how you choose to interpret the value of c or d.

Tom
  • 417
  • 3
  • 10
  • 1
    To properly output the values for `c` and `c` as `char` / `unsigned char` you need to proving the `hh` *Length Modifier*, e.g. `printf("%hhd %hhd %hhu %hhu\n", c, d, c, d);`, See [man 3 printf](https://man7.org/linux/man-pages/man3/printf.3.html). You should also make clear `-2` will be stored as `FE` on computers that encode negative signed values in *two-compliment* (most all do, but there are exceptions) You also must clarify the conversion from unsigned to signed is *Implementation defined*. Good effort on the write-up (though pictures instead of text may be a bit much....) – David C. Rankin Jun 27 '23 at 07:03
  • @DavidC.Rankin I have updated my answer. How do you feel about my current answer? – Tom Jun 27 '23 at 08:05
  • 2
    @Whozcry, `printf("%hhd\n", c);` is UB in C89 as `"hh"` is later C specification. Post is tagged C89 as that is OP's key concern. – chux - Reinstate Monica Jun 27 '23 at 09:32
  • @chux-ReinstateMonica Where do say that "printf("%hhd\n", c); is UB in C89"? I run code, result is right. Do you have any counterexamples? – Tom Jun 27 '23 at 09:36
  • 2
    @Whozcry As `"hh"` prefix is not specified in C89, then C89's The `fprintf` function: " If a conversion specification is not valid, the behavior is undefined" applies – chux - Reinstate Monica Jun 27 '23 at 09:41
  • @chux-ReinstateMonica What I'm using is C89. I said I haven't seen `"hh"` before, so I added a note to my answer. (As @David C. Rankin pointed out, use "printf("%hhd %hhd %hhu %hhu\n", c, d, c, d);" to output the values for c and d as char/unsigned char). – Tom Jun 27 '23 at 09:44
  • 1
    Your C89 compiler, when encountering `"hh"` is using an _extension_ to the language that is not part of C89. – chux - Reinstate Monica Jun 27 '23 at 09:48
  • @chux-ReinstateMonica I have updated my answer. – Tom Jun 27 '23 at 09:48
  • 2
    `%hhd` will give you [*warning: ISO C90 does not support the 'hh' gnu_printf length modifier*](https://godbolt.org/z/qrd3Kb3eT) – phuclv Jun 27 '23 at 09:50
  • @DavidC.Rankin As `"hh"` prefix is not specified in C89, I feel like `"hh"` is a bit misleading me. – Tom Jun 27 '23 at 09:50
  • @phuclv But I use command `clang -Wall -Wextra -pedantic -std=c89 foo.c && ./a.out` or `gcc -g -o foo foo.c && ./foo` to compile code contain `printf("%hhd %hhd %hhu %hhu\n", c, d, c, d);` and execute it, it does not generate any warning. – Tom Jun 27 '23 at 09:55
  • 2
    @Whozcry "And see FF as a signed char, it's -1" is true when `signed char` is 8-bit 2's complement, but different when `CHAR_BIT != 8` or under the rare sign-magnitude or ones' compliment encoding. Both of these are rare in 2023. – chux - Reinstate Monica Jun 27 '23 at 09:56
  • 3
    Only citations from the actual standard can answer this question. Other things like code fragments and pictures could be nice illustrative additions to that, but without relevant citations they make no sense. – n. m. could be an AI Jun 27 '23 at 09:57
  • @chux-ReinstateMonica The current issue is whether the original problem made it clear on what machine their project is based on or not. – Tom Jun 27 '23 at 10:02
  • @n.m.willseey'allonReddit I think it's a centain behavior with output `-2 254 -2 254` – Tom Jun 27 '23 at 10:09
  • "Nowadays, I basically only use **C89** when writing programs and haven't encountered any problems yet." – Tom Jun 27 '23 at 10:14
  • 3
    @Whozcry C89 may produce the desired output with OP's compiler on OP's machine, yet the `-2 254` output is not specified for all compliant C89 compilers on all machines. – chux - Reinstate Monica Jun 27 '23 at 10:14
  • @chux-ReinstateMonica So I think the original question should clarify what kind of machine his project is running on. – Tom Jun 27 '23 at 10:17
  • 2
    @Whozcry "Nowadays, I basically only use C89" --> except your C89 compiler that you use is also using extensions (e. g. `"hh"`) to the C89 spec, so _only_ overstates. Perhaps a compiler option exists to only use C89. – chux - Reinstate Monica Jun 27 '23 at 10:20
  • wait a monent, I will update my answer. – Tom Jun 27 '23 at 10:21
  • 2
    @Whozcry "So I think the original question should clarify what kind of machine his project is running on." -- I wrote the original question. What kind of machine my project is running on is irrelevant. I want to know whether my code is going to produce the same behavior on *any* standards-conformant compiler. The correct answer is "No". Output from any particular compiler or machine is not sufficient to answer my question. The answer must be determined by citations to the C89 standard. It is clear from the other answer that the behavior of my code is implementation-defined. – Lone Learner Jun 27 '23 at 10:31
  • @LoneLearner I think it's a centain behavior with output `-2 254 -2 254`. I need to organize my answer. I was just led astray by them. – Tom Jun 27 '23 at 11:18
  • @LoneLearner Please again see my updated answer. Thank you. – Tom Jun 27 '23 at 13:26
  • @DavidC.Rankin Please again see my updated answer. Thank you. – Tom Jun 27 '23 at 13:47
  • @chux-ReinstateMonica Please again see my updated answer. Thank you. – Tom Jun 27 '23 at 13:48
  • @phuclv Please again see my updated answer. Thank you. – Tom Jun 27 '23 at 13:48
  • @n.m.willseey'allonReddit Please again see my updated answer. Thank you. – Tom Jun 27 '23 at 13:48
  • 1
    @Whozcry Attempting to explain behavior via the bit pattern `0b111 1110` or `0xFE` or 2's complement is weak. Instead think of the _value_ and not the encoding. Bringing in `"%u"` is an unnecessary complication - best to avoid. OP has 2 lines that change values: `d = (unsigned char) c;` and `c = (signed char) d;`. When `signed char` and `unsigned char` are narrower than `int` (_very_ common) and `c < 0`, `(unsigned char) c` is well defined as `(unsigned char)((int) c + UCHAR_MAX + 1)`. With `c = (signed char) d;` and `d > SCHAR_MAX`, the result is implementation defined. – chux - Reinstate Monica Jun 27 '23 at 15:48
  • This answer still looks incorrect to me. As @chux-ReinstateMonica has succinctly stated in her answer with citations to the C89 standard, the behavior of the program in the question is implementation-defined. – Lone Learner Jun 27 '23 at 16:18
  • @chux-ReinstateMonica Your comment with "and c < 0", in my answer c is 254(or 0xFE)that bigger then 0. And if you look at the disassembly code of the program, you will find that it is similar to my explanation. – Tom Jun 27 '23 at 23:32
  • @LoneLearner As you say "the behavior of the program in the question is implementation-defined.", I think output double `-2 254` is right, and It is "predictable". – Tom Jun 27 '23 at 23:39
  • 2
    @Whozcry [Your comment with "and c < 0", in my answer c is 254](https://stackoverflow.com/questions/76560765/is-converting-from-unsigned-char-to-signed-char-and-vice-versa-in-c89-well-defin/76561749?noredirect=1#comment135002286_76561749) ---> "c is 254" is not possible on your machine as `c` is a `signed char` with a range of [-128...127]. `c` has a value of -2 and that is less than 0. Again, avoid looking at the bit pattern and focus on the _value_. It is by _value_ that most of the C specification is written. – chux - Reinstate Monica Jun 28 '23 at 01:38
  • @chux-ReinstateMonica c is `254` when c is a unsigned char, your say "avoid looking at the bit pattern", I apologize for not making it clear how I arrived at the answer for 0xFFFFFFFE, which may have led you to mistakenly think that I was focusing on the bit pattern. I need to add disassemble the code in my answer for explain why am I focusing on 0xFFFFFFFE, but I'm not considering it from the perspective of bit patterns. – Tom Jun 28 '23 at 01:50
  • 1
    @Whozcry "c is 254 when c is a unsigned char" --> Yet `c` is not an `unsigned char`. OP has only `signed char c;`, – chux - Reinstate Monica Jun 28 '23 at 02:58
  • @chux-ReinstateMonica My answer just update, please see again. – Tom Jun 28 '23 at 03:01
  • @chux-ReinstateMonica I want to say that when assign value to `variable c`, it doesn't matter to `-2` or `254`, the values in memory are all the same. – Tom Jun 28 '23 at 03:05
  • I already gave you my vote -- so you got all I can give. (which was a +1 before the wolves got to the answer) – David C. Rankin Jun 28 '23 at 06:26
  • @DavidC.Rankin Is there something wrong with my answer? That's why others voted it down. – Tom Jun 28 '23 at 09:09
  • @Whozcry Rather than use the disassembler to validate your assertions, use the C89 specification. "when assign value to variable c, it doesn't matter to -2 or 254, the values in memory are all the same" is a reflection of what happens with your compiler on your machine. That is not certain for all compliant C89 compilers on all machines. `signed char c = 254` being the same as `signed char c = -2` is _implementation defined_. Those 2 assignments have different values stored for `c` when out-of-range conversions are capped or when the byte width is more than 8 for example, even if uncommon. – chux - Reinstate Monica Jun 28 '23 at 13:57
  • @chux-ReinstateMonica It seems like I know why my answer was voted down. – Tom Jun 28 '23 at 14:02
  • 1
    @Whozcry You are engaged and trying - that is good - even if the answer has short-comings. C has challenges given it has evolved over 40+ years and OP is asking about an _olde_ version. – chux - Reinstate Monica Jun 28 '23 at 14:07
  • 1
    @Whozcry - No, not really, but it is quite long. As far as the downvotes go, don't worry too much about them. There are certain groups that seem to be overly eager to downvote -- which I see as discouraging to new members willingness to participate and answer questions -- but that is a problem SO has always had (at least for the last decade). Just learn from the comments and your answers will get better and the downvotes will go away. Sometimes there is no rhyme or reason you can discern for the downvote -- just chock it up to the phase of the moon and move on. – David C. Rankin Jun 28 '23 at 17:04
  • 2
    The assembly output just tells you what your compiler did , it doesn't tell you what the language standard guarantees. This answer is completely worthless – M.M Jun 28 '23 at 23:11
  • @chux-ReinstateMonica As a new member of Stack Overflow, I still need to work harder. – Tom Jun 29 '23 at 00:14
  • @DavidC.Rankin Thank you for your response. – Tom Jun 29 '23 at 00:15
  • @M.M I apologize for providing a valueless answer. – Tom Jun 29 '23 at 00:15