5

In Linux with c , I didn't understant what is the diffrence between char* and unsigned char* When I reading/writing binary buffer ?

When I must not using char* and need to use unsigned char*?

David Ranieri
  • 39,972
  • 7
  • 52
  • 94
parser1234
  • 111
  • 1
  • 7
  • the unsigned won't have negative numbers – Cid Jun 18 '20 at 05:21
  • 1
    Does this answer your question? [What is the difference between char and unsigned char?](https://stackoverflow.com/questions/22629728/what-is-the-difference-between-char-and-unsigned-char) – Alex Lop. Jun 18 '20 at 05:21
  • 1
    It's implementation defined if `char` is signed or unsigned. `unsigned char` is always unsigned. – Some programmer dude Jun 18 '20 at 05:21
  • @Some programmer dude I edit the post, I taking when I reading /writing binary buffer with C – parser1234 Jun 18 '20 at 05:23
  • 1
    For example, if you need to compare the bytes and consider 0xff greater than 0x01, then you should use `unsigned char *`. But if you consider 0xff is lesser than 0x01, you should use `signed char *`. – Joël Hecht Jun 18 '20 at 05:27

1 Answers1

3

First recall C has unsigned char, signed char and char: 3 distinct types. char has the same range as either unsigned char or signed char.

[Edit]

OP added "When I reading/writing binary buffer" so the far below sections (my original post) deals with "what is the difference between char* and unsigned char*" with a sample case without that r/w concern. Within this section ....

Reading/writing binary via <stdio.h> can be done with any I/O function although it is more common to to use fread()/fwite().

For byte orientated data, all I/O functions behave as if

The byte input functions read characters from the stream as if by successive calls to the fgetc function. C17dr § 7.21.3 11
The byte output functions write characters to the stream as if by successive calls to the fputc function. § 7.21.3 12

So let us look at those two.

... the fgetc function obtains that character as an unsigned char ... § 7.21.7.1 2
The fputc function writes the character specified by c (converted to an unsigned char) § 7.21.7.3 2

Thus all I/O at the lowest level is best thought of as reading/writing unsigned char.

Now to directly address

When I must not using char* and need to use unsigned char*? (OP)

With writing, pointers such as char*, unsigned char* or others can be used at OP level code, yet the underlying output function accesses data via unsigned char *. This has no impact on OP's execution of the write other than if char was encoded as ones' complement/sign magnitude - a trap code would not get detected.

Likewise with reading, the underlying input function saves data via unsigned char * and no traps occur. A single byte read via int fgetc() would report values in the unsigned char range even if char is signed.

The importance of using unsigned char* vs. char* in reading/writing binary buffer comes not so much in the I/O call itself (it all unsigned char * access), but in the setting up of data prior to writing and the interpretation of data after reading - see memcmp() below.



When I must not using char* and need to use unsigned char*?

A good example is with string related code.

Although functions in <string.h> use char* in function parameters, the implementations performs as if char was unsigned char, even when char is signed.

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3

So even if char is a signed char, functions like int strcmp(char *a, char *b) perform as if int strcmp(unsigned char *a, unsigned char *b).

  1. This makes a difference when string differ by a signed char c and char d with values of different signs.
    E.g. Assume c < 0, d > 0

    // Accessed via char * and char is signed c < d is true // Accessed via unsigned char * c > d is false

This results in a different sign from the strcmp() return and so affects sorting strings.

// Incorrect code when `char` is signed.
int strcmp(const char *a, const char *b) {
  while (*a == *b && *a) { a++; b++; }
  return (*a > *b) - (*a < *b);
}

// Correct code when `char` is signed or unsigned, 2's complement or not
int strcmp(const char *a, const char *b) {
  const char *ua = a;
  const char *ub = b;
  while (*ua == *ub && *ua) { ua++; ub++; }
  return (*ua > *ub) - (*ua < *ub);
}

[Edit]

The like-wise applies to binary data read and compared with memcmp().

  1. In old C implementations that did not use 2's complement, there could be 2 zeros: +0 and -0 (or trap).

+0 ended a string when properly view as a unsigned char. -0 is not a null character to terminate a string, even though as a signed char it has a value of zero.

// Incorrect code when `char` is signed and not 2's complement.
// Conversion to `unsigned char` done too late.
int strcmp(const char *a, const char *b) {
  while ((unsigned char)*a == (unsigned char)*b && (unsigned char)*a) { a++; b++; }
  return ((unsigned char)*a > (unsigned char)*b) - ((unsigned char)*a < (unsigned char)*b);
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Huh? Q: "I didn't understand the difference...", A: "1) This makes a difference when..." There seems to be almost an exact correlation. – David C. Rankin Jun 18 '20 at 07:02