-1

I have a code that translates ASCII char array to a Hex char array:

void ASCIIFormatCharArray2HexFormatCharArray(char chrASCII[72], char chrHex[144])
{
  int i,j;
  memset(chrHex, 0, 144); 
  for(i=0, j=0; i<strlen(chrASCII); i++, j+=2)
  {
      sprintf((char*)chrHex + j, "%02X", chrASCII[i]);
  }
  chrHex[j] = '\0';
}

When I insert the function the char 'א' - Alef, the equivalent to 'A' in English, the function does this:

chrHex = "FFFF"

I don't understand how 1 char translates to 2 bytes of Hex ("FFFF") instead of 1 byte(like "u" in ASCII is "75" in Hex) when it's not even an English letter. would love for an explanation of how the compiler treats 'א' like so.

  • 1
    I don't know what exactly is happening here, but Aleph is U+05D0 and I don't expect there to be any encoding where it's correctly represented as FF FF. So my guess is that the input to that method already doesn't represent Aleph in a meaningful way, but the problem is at some earlier point. Your usage of `char` implies that you are using either a fix-width 8-bit encoding (such as one of the ISO-8859-* family) or UTF-8. In UTF-8 Aleph would be represented by the bytes `D7 90`. In [ISO-8859-8](https://de.wikipedia.org/wiki/ISO_8859-8) it is `E0`. – Joachim Sauer Oct 04 '21 at 12:34
  • 1
    There is a similar discussion [here](https://stackoverflow.com/a/32888349/645128) – ryyker Oct 04 '21 at 12:38
  • One char is equivalent of 1 byte long so by default you will have a byte representation. Try %x only rather than %02x which gives one byte and I will add space between every byte to represent it better. – bgorkhe Oct 04 '21 at 12:41
  • 1
    This would have been easy to debug. Viewing the `chrHex` array in each iteration of the loop, with either a debugger or a simple `printf("chrHex = %s.\n", chrHex);` inside the loop, would have revealed what is going on. Submitters should fully examine the execution of code like this before posting a question on Stack Overflow. – Eric Postpischil Oct 04 '21 at 13:15

1 Answers1

2

When “א” appears in a string literal, your compiler likely represents it with the bytes D716 and 9016, although other possibilities are allowed by the C standard.

When these bytes are interpreted as a signed char, they have the values −41 and −112. When these are passed as an argument to sprintf, they are automatically promoted to int. In a 32-bit two’s complement int, the bits used to represent −41 and −112 are FFFFFFD716 and FFFFFF9016.

The behavior of asking sprintf to format these with %02X is technically not defined by the C standard, because an unsigned int should be passed for X, rather than an int. However, your C implementation likely formats them as “FFFFFFD7” and “FFFFFF90”.

So the first sprintf puts “FFFFFFD7” in chrHex starting at element 0.

Then the second sprintf puts “FFFFFF90” in chrHex starting at element 2, partially overwriting the first string. Now chrHex contains “FFFFFFFF90”.

Then chrHex[j] = '\0'; puts a null character an element 4, truncating the string to “FFFF”.

To fix this, change the sprintf to expect an unsigned char and pass an unsigned char value (which will be promoted to int, but sprintf expects that for hhX and works with it):

sprintf(chrHex + j, "%02hhX", (unsigned char) chrASCII[i]);
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312