1

I am initializing a symlink in an ext2 inode (school assignment).

I got the idea to do it in hex since the field is defined as uint32_t i_block[EXT2_N_BLOCKS].

As an example:

#include <stdio.h>

int main () {
  // unsigned int is 32 bytes on my system
  unsigned int i = 0x68656c6c; // hell
  printf("%.*s\n", 4, &i");

I got the output

lleh

Is this because my system is little-endian? Does that mean if I hardcode the opposite order, it would not port to big-endian systems (my eventual goal is hello-world)?

What is the best, most simple way to store a character string into an array of unsigned integers?

user129393192
  • 797
  • 1
  • 8
  • Have you tried checking an existing symbolic link made the "proper" way to see how it store data? Perhaps it's as simple as just treat the array as an array of bytes and use `memcpy` to copy the string into it? – Some programmer dude May 22 '23 at 06:21
  • 1
    Mount the filesystem. Use the `ln -s` command to create a symbolic link. Read the inode and look at its data. How are the characters stored in that? – Some programmer dude May 22 '23 at 06:29
  • It is LSB -> MSB. I believe the answer to my question is that the hex representation wouldn’t port, since it represents a value and not the underlying bytes being set. – user129393192 May 22 '23 at 06:46

2 Answers2

1

Is this because my system is little-endian?

Yes.

Does that mean if I hardcode the opposite order, it would not port to big-endian systems

Code relying on the byte order of integers is non-portable indeed.

What is the best, most simple way to store a character string into an array of unsigned integers?

The best way is not to use integers at all but char, which unlike integers does not depend on endianess and was actually designed for the purpose of storing characters.

You could ignore that it is an integer type and just memcpy a string into it:

unsigned int i;
memcpy(&i, "hell", 4);

Or if you prefer: memcpy(&i, "\x68\x65\x6c\x6c", 4);.

Otherwise you'll have to invent some ugly hack like for example:

#define LITTLE_ENDIAN  (*(unsigned char*) &(int){0xAA} == 0xAA)
unsigned int i = LITTLE_ENDIAN ? 0x6c6c6568 : 0x68656c6c;
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Thanks. I ended up with `memcpy(&i, (const unsigned char[4]) {0x68, 0x65, 0x6c, 0x6c });`. As a follow up, would something like `memcpy(&i, "rld", 3);` also port, assuming `i` was `0` initialized? That is, would `memcpy` only copy into the first 3 bytes, starting from LSB and going to MSB? – user129393192 May 22 '23 at 07:05
  • @user129393192 Yes except what counts as LS byte in the integer depends on endianess. – Lundin May 22 '23 at 07:32
  • @user129393192 "would memcpy only copy into the first 3 bytes, starting from LSB and going to MSB?" --> No. it would copy to &i and then up to the next 2 bytes. The LSB, MSB of `i` is irrelevant. – chux - Reinstate Monica May 22 '23 at 15:58
  • Got it, so it will copy byte by byte from the lowest address and the representation will be be the same on big-little endian, but the interpretation different? – user129393192 May 22 '23 at 16:18
  • @user129393192 With `(char *) &i`, the interpretation as a _string_ is the same, regardless of `unsigned` endian. `i`, as an `unsigned` differs. Details depend on what your `it` is. Tip: Avoid pronouns. – chux - Reinstate Monica May 22 '23 at 16:22
0

Strictly speaking, printf("%.*s\n", 4, &i"); is undefined behavior (UB) as "%.s" expects a pointer to a character and &i is a pointer to an int.

A better alternative uses a union.

union {
  unsigned u;
  unsigned char uc[sizeof (unsigned)];
} x = { .u = 0x68656c6c};

printf("%.*s\n", (int) sizeof x.uc, x.uc);

Even better, use uint32_t instead of unsigned.


What is the best, most simple way to store a character string into an array of unsigned integers?

Avoid all endian concerns via a union and initialize via the .uc member.

#include <stdio.h>
#define N 42

int main(void) {
  union {
    unsigned u[N];
    unsigned char uc[sizeof (unsigned[N])];
  } x = { .uc = "Hello"};
  printf("<%.*s>\n", (int) sizeof x.uc, x.uc);
}

Output

<Hello>

Note that .uc[] might not be a string with a long enough initializer as it may lack a null character.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Would a simple cast `(unsigned char*) &i` be sufficient for the `printf` function? – user129393192 May 22 '23 at 16:19
  • @user129393192 Yes, for the first code. For the last question, "store a character string into an array of unsigned integers", a `union` is better. – chux - Reinstate Monica May 22 '23 at 16:25
  • @user129393192 Why do you want to initialize with `0x68656c6c` instead of `"hell"`? Certainly the 2nd is more clear. – chux - Reinstate Monica May 22 '23 at 16:27
  • I end up doing it with `hell` as Lundin suggested – user129393192 May 22 '23 at 16:36
  • @user129393192 Note the goal involved "I am initializing a symlink ...". In C, `memcpy(&i, "hell", 4);` is an _assignment_ (something after object definition). `x = { .uc = "Hello"}` is initializing (value defined at object definition). Your call. – chux - Reinstate Monica May 22 '23 at 16:42
  • What exact difference would it make? I often hear this difference of "assignment" and "initialization". – user129393192 May 22 '23 at 18:17
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253774/discussion-between-chux-reinstate-monica-and-user129393192). – chux - Reinstate Monica May 22 '23 at 18:33
  • Note that if you use a union, the `.u = 0x68656c6c` constant must still be according to endianess or the string representation will be backwards. So the union doesn't really solve the problem since assigning to the `u` member will store that constant according to endianess. The actual problem here lies with the initializer. – Lundin May 23 '23 at 06:43
  • @Lundin Re-worded to emphasize initializing via the `.uc` member. – chux - Reinstate Monica May 23 '23 at 10:17