Unsigned int from 32 bit to 64bit OS

Question

This code snippet is excerpted from a linux book. If this is not appropriate to post the code snippet here, please let me know. I will delete it. Thanks.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  char buf[30];
  char *p;
  int i;
  unsigned int index = 0;
  //unsigned long index = 0;
  printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));
  for(i = 'A'; i <= 'Z'; i++)
      buf[i - 'A'] = i;
  p  = &buf[1];
  printf("%c: buf=%p p=%p p[-1]=%p\n", p[index-1], buf, p, &p[index-1]);
  return 0;
}

On 32-bit OS environment: This program works fine no matter the data type of index is unsigned int or unsigned long.

On 64-bit OS environment: The same program will run into "core dump" if index is declared as unsigned int. However, if I only change the data type of index from unsigned int to a) unsigned long or b) unsigned short, this program works fine too.

The reason from the book only tells me that 64-bit will cause the core-dump due to non-negative number. But I have no idea exactly about the reason why unsigned long and unsigned short work but unsigned int.

What I am confused is that

p + (0u -1) == p + UINT_MAX when index is unsigned int.

BUT,

p + (0ul - 1) == p[-1] when index is unsigned long.

I get stuck at here.

If anyone can help to elaborate the details, it is highly appreciated!

Thank you.

Here comes some result on my 32 bit(RHEL5.10/gcc version 4.1.2 20080704)

and 64 bit machine (RHEL6.3/gcc version 4.4.6 20120305)

I am not sure if gcc version makes any difference here. So, I paste the information as well.

On 32 bit:

I tried two changes:

1) Modify unsigned int index = 0 to unsigned short index = 0.

2) Modify unsigned int index = 0 to unsigned char index = 0.

The program can run without problem.

index-1 = ffffffff (sizeof 4)

A: buf=0xbfbdd5da p=0xbfbdd5db p[-1]=0xbfbdd5da

It seems that the data type of index will be promoted to 4 bytes due to -1.

On 64 bit:

I tried three changes:

1) Modify unsigned int index = 0 to unsigned char index = 0.

  It works!

index-1 = ffffffff (sizeof 4)

A: buf=0x7fffef304ae0 p=0x7fffef304ae1 p[-1]=0x7fffef304ae0

2) Modify unsigned int index = 0 to unsigned short index = 0.

 It works!

index-1 = ffffffff (sizeof 4)

A: buf=0x7fff48233170 p=0x7fff48233171 p[-1]=0x7fff48233170

3) Modify unsigned int index = 0 to unsigned long index = 0.

 It works!

index-1 = ffffffff (sizeof 8)

A: buf=0x7fffb81d6c20 p=0x7fffb81d6c21 p[-1]=0x7fffb81d6c20

BUT, only

unsigned int index = 0 runs into the core dump at the last printf.

index-1 = ffffffff (sizeof 4)

Segmentation fault (core dumped)

Your first printf is likely the problem. Depending on the architecture and phase of the moon, printf expects a `%l` value to be passed as two parm locations. — Hot Licks, Aug 10 '14 at 12:59
In general, you're trying to access memory that is probably out of your accessible memory range - `p[index-1]` is equal to reading from `p + UINT_MAX` location. — SomeWittyUsername, Aug 10 '14 at 13:11
Thanks for your comments, Hot Licks. 1) I change the %lx to %d, it run into core dump. 2) I comment out that entire line and leave only the last printf, I still run into the core dump. Would like to learn from you if you'd like to shed me some light? Thanks. Result: On 32bit: index-1 = ffffffff (sizeof 4) A: buf=0xbf8297b6 p=0xbf8297b7 p[-1]=0xbf8297b6 On 64bit: Segmentation fault (core dumped) — rickhau, Aug 10 '14 at 13:23
Thanks, icepack. But it can pass without any problem by switching the data type from unsigned int to unsigned long. From my understanding is, p points to 2nd element of buf which is buf[1]. So, p[index-1] should have no problem to access the 1st element of buf. However, I am not very clear on the data conversion among array index, unsigned integer and -1 as I think this might be this could be the reason leading to the problem... — rickhau, Aug 10 '14 at 13:28
Printing the result of `sizeof` using `%d` is undefined behavior. Use `%zu` for formatting an object of type `size_t`. — The Paramagnetic Croissant, Aug 10 '14 at 14:37
P.S. Lessons on development of 64-bit C/C++ applications: http://www.viva64.com/en/l/ — , Aug 10 '14 at 15:31
1) `"index-1 = %lx (sizeof %d)\n"` --> `"index-1 = %x (sizeof %zu)\n"`. A well enabled compiler would warn about this. 2) `p[index-1]` is undefined behavior. Attempting to access memory not known to be valid. — chux - Reinstate Monica, Aug 13 '14 at 15:50

Deduplicator · Answer 1 · 2014-08-10T13:27:50.373

Do not lie to the compiler!

Passing printf an int where it expects a long (%ld) is undefined behavior.
(Creating a pointer pointing outside any valid object (and not just behind one) is UB too...)

Correct the format specifiers and the pointer arithmetic (that includes indexing as a special case) and everything will work.

_{UB includes "It works as expected" as well as "Catastrophic failure".}

BTW: If you politely ask your compiler for all warnings, it would warn you. Use -Wall -Wextra -pedantic or similar.

elcuco · Answer 2 · 2014-08-10T13:31:41.017

1

One other problem is code has is in your printf():

  printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));

Lets simplify:

int i = 100;
print("%lx", i-1);

You are telling printf here is a long but in reality you are sending an int. clang does tell you the corrent warning (I think gcc should also spit the correct waring). See:

test1.c:6:19: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
printf("%lx", i - 100);
        ~~~   ^~~~~~~
        %x   
1 warning generated.

Solution is simple: you need to pass a long to printf or tell printf to print an int:

printf("%lx", (long)(i-100) );
printf("%x", i-100);

You got luck on 32bit and your app did not crash. Porting it to 64bit revealed a bug in your code and you can now fix it.

edited Aug 10 '14 at 13:31

answered Aug 10 '14 at 13:22

elcuco

8,948
9
47
69

That's correct, but not OP's question, which is about the second `printf`. @rickhau: Another reason for providing minimal examples and not updating questions via comments (but editing the question itself). – mafso Aug 10 '14 at 13:44
@mafso Yes, I saw that. Thanks. I updated the working on my response as even tough it does not fix the exact crash, this is something else that can go wrong (for example - if there are some variables pushed to the stack after the int, `printf` will not display them correctly). – elcuco Aug 10 '14 at 13:47
Thanks, @elcuco It seems that I still run into the core dump even I use %x instead of %lx on the first printf. – rickhau Aug 10 '14 at 13:52
@rickhau Not unexpected - you are referencing an out of bound index in your array. – elcuco Aug 10 '14 at 14:03

mafso · Accepted Answer · 2014-08-11T15:58:07.417

Arithmetic on unsigned values is always defined, in terms of wrap-around. E.g. (unsigned)-1 is the same as UINT_MAX. So an expression like

p + (0u-1)

is equivalent to

p + UINT_MAX

(&p[0u-1] is equivalent to &*(p + (0u-1)) and p + (0u-1)).

Maybe this is easier to understand if we replace the pointers with unsigned integer types. Consider:

uint32_t p32; // say, this is a 32-bit "pointer"
uint64_t p64; // a 64-bit "pointer"

Assuming 16, 32, and 64 bit for short, int, and long, respectively (entries on the same line equal):

p32 + (unsigned short)-1    p32 + USHRT_MAX     p32 + (UINT_MAX>>16)
p32 + (0u-1)                p32 + UINT_MAX      p32 - 1
p32 + (0ul-1)               p32 + ULONG_MAX     p32 + UINT_MAX          p32 - 1

p64 + (0u-1)                p64 + UINT_MAX
p64 + (0ul-1)               p64 + ULONG_MAX     p64 - 1

You can always replace operands of addition, subtraction and multiplication on unsigned types by something congruent modulo the maximum value + 1. For example,

-1 ☰ ffffffff_hex mod 2³²

(ffffffff_hex is 2³²-1 or UINT_MAX), and also

ffffffffffffffff_hex ☰ ffffffff_hex mod 2³²

(for a 32-bit unsigned type you can always truncate to the least-significant 8 hex-digits).

Your examples:

32-bit

unsigned short index = 0;

In index - 1, index is promoted to int. The result has type int and value -1 (which is negative). Same for unsigned char.

64-bit

unsigned char index = 0;
unsigned short index = 0;

Same as for 32-bit. index is promoted to int, index - 1 is negative.

unsigned long index = 0;

The output

index-1 = ffffffff (sizeof 8)

is weird, it’s your only correct use of %lx but looks like you’ve printed it with %x (expecting 4 bytes); on my 64-bit computer (with 64-bit long) and with %lx I get:

index-1 = ffffffffffffffff (sizeof 8)

ffffffffffffffff_hex is -1 modulo 2⁶⁴.

unsigned index = 0;

An int cannot hold any value unsigned int can, so in index - 1 nothing is promoted to int, the result has type unsigned int and value -1 (which is positive, being the same as UINT_MAX or ffffffff_hex, since the type is unsigned). For 32-bit-addresses, adding this value is the same as subtracting one:

    bfbdd5db            bfbdd5db
+   ffffffff          -        1
=  1bfbdd5da
=   bfbdd5da          = bfbdd5da

(Note the wrap-around/truncation.) For 64-bit addresses, however:

    00007fff b81d6c21
+            ffffffff
=   00008000 b81d6c20

with no wrap-around. This is trying to access an invalid address, so you get a segfault.

Maybe have a look at 2’s complement on Wikipedia.

Under my 64-bit Linux, using a specifier expecting a 32-bit value while passing a 64-bit type (and the other way round) seems to “work”, only the 32 least-significant bits are read. But use the correct ones. lx expects an unsigned long, unmodified x an unsigned int, hx an unsigned short (an unsigned short is promoted to int when passed to printf (it’s passed as a variable argument), due to default argument promotions). The length modifier for size_t is z, as in %zu:

printf("index-1 = %lx (sizeof %zu)\n", (unsigned long)(index-1), sizeof(index-1));

(The conversion to unsigned long doesn’t change the value of an unsigned int, unsigned short, or unsigned char expression.)

sizeof(index-1) could also have been written as sizeof(+index), the only effect on the size of the expression are the usual arithmetic conversions, which are also triggered by unary +.

@Deduplicator: I hesitated writing that, it may lead to the misinterpretation, that e.g. `1u << 32` would be defined to evaluate to `0` for 32-bit integers, but it is UB. Do you have a suggestion for a clearer wording, without the need to bloat the answer with unrelated topics? — mafso, Aug 10 '14 at 13:23
Thanks your comments @mafso. What I am confused is that `p + (0u -1) == p + UINT_MAX` when index is unsigned int. BUT, `p + (0ul - 1) == p[-1]` when index is unsigned long. I get stuck at here. — rickhau, Aug 10 '14 at 13:46
Conversions are defined to be value-preserving. If that's not possible, for unsigned target types there's wrap-around, for signed target types you get implementation-defined value or signal. — Deduplicator, Aug 10 '14 at 13:50
@mafso: I do really appreciate your help to clear part of my confusion as it is clear to me that what the value it should be between 32 bit and 64bit. However, I made some changes according to your good information. It turns out some interesting result. unsigned short type of index variable can work on both 32 bit and 64 bit OS but unsigned int can not... it is really weird.. — rickhau, Aug 10 '14 at 14:40
@rickhau: Does this edited post answer your question? I also don't really see what parts of this post may be superfluous, but it seems unnecessarily long now. Any input on that would be appreciated! — mafso, Aug 13 '14 at 01:58
@masfso: Thanks for your great and detail explanation. Sorry for the late feedback as I am not aware of a new edited update. Your new post is more clear. Thanks for pointing out the wrap-around where I was stuck at. This is really a great post! Thanks again, masfso! — rickhau, Aug 26 '14 at 15:39
sorry I was using mobile to vote another question but not aware of clicking the unaccept on this one. Weird phone. — rickhau, Oct 30 '14 at 13:25

Unsigned int from 32 bit to 64bit OS

Thank you.

3 Answers3