In Linux with c , I didn't understant what is the diffrence between char*
and unsigned char*
When I reading/writing binary buffer ?
When I must not using char*
and need to use unsigned char*
?
In Linux with c , I didn't understant what is the diffrence between char*
and unsigned char*
When I reading/writing binary buffer ?
When I must not using char*
and need to use unsigned char*
?
First recall C has unsigned char
, signed char
and char
: 3 distinct types. char
has the same range as either unsigned char
or signed char
.
[Edit]
OP added "When I reading/writing binary buffer" so the far below sections (my original post) deals with "what is the difference between char*
and unsigned char*
" with a sample case without that r/w concern. Within this section ....
Reading/writing binary via <stdio.h>
can be done with any I/O function although it is more common to to use fread()/fwite()
.
For byte orientated data, all I/O functions behave as if
The byte input functions read characters from the stream as if by successive calls to the
fgetc
function. C17dr § 7.21.3 11
The byte output functions write characters to the stream as if by successive calls to thefputc
function. § 7.21.3 12
So let us look at those two.
... the
fgetc
function obtains that character as anunsigned char
... § 7.21.7.1 2
Thefputc
function writes the character specified by c (converted to anunsigned char
) § 7.21.7.3 2
Thus all I/O at the lowest level is best thought of as reading/writing unsigned char
.
Now to directly address
When I must not using
char*
and need to useunsigned char*
? (OP)
With writing, pointers such as char*
, unsigned char*
or others can be used at OP level code, yet the underlying output function accesses data via unsigned char *
. This has no impact on OP's execution of the write other than if char
was encoded as ones' complement/sign magnitude - a trap code would not get detected.
Likewise with reading, the underlying input function saves data via unsigned char *
and no traps occur. A single byte read via int fgetc()
would report values in the unsigned char
range even if char
is signed.
The importance of using unsigned char*
vs. char*
in reading/writing binary buffer comes not so much in the I/O call itself (it all unsigned char *
access), but in the setting up of data prior to writing and the interpretation of data after reading - see memcmp()
below.
When I must not using
char*
and need to useunsigned char*
?
A good example is with string related code.
Although functions in <string.h>
use char*
in function parameters, the implementations performs as if char
was unsigned char
, even when char
is signed.
For all functions in this subclause, each character shall be interpreted as if it had the type
unsigned char
(and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3
So even if char
is a signed char
, functions like int strcmp(char *a, char *b)
perform as if int strcmp(unsigned char *a, unsigned char *b)
.
This makes a difference when string differ by a signed char c
and char d
with values of different signs.
E.g. Assume c < 0, d > 0
// Accessed via char *
and char
is signed
c < d is true
// Accessed via unsigned char *
c > d is false
This results in a different sign from the strcmp()
return and so affects sorting strings.
// Incorrect code when `char` is signed.
int strcmp(const char *a, const char *b) {
while (*a == *b && *a) { a++; b++; }
return (*a > *b) - (*a < *b);
}
// Correct code when `char` is signed or unsigned, 2's complement or not
int strcmp(const char *a, const char *b) {
const char *ua = a;
const char *ub = b;
while (*ua == *ub && *ua) { ua++; ub++; }
return (*ua > *ub) - (*ua < *ub);
}
[Edit]
The like-wise applies to binary data read and compared with memcmp()
.
+0 ended a string when properly view as a unsigned char
. -0 is not a null character to terminate a string, even though as a signed char
it has a value of zero.
// Incorrect code when `char` is signed and not 2's complement.
// Conversion to `unsigned char` done too late.
int strcmp(const char *a, const char *b) {
while ((unsigned char)*a == (unsigned char)*b && (unsigned char)*a) { a++; b++; }
return ((unsigned char)*a > (unsigned char)*b) - ((unsigned char)*a < (unsigned char)*b);
}