4

I'm working with some embedded code and I am writing something new from scratch so I am preferring to stick with the uint8_t, int8_t and so on types.

However, when porting a function:

void functionName(char *data)

to:

void functionName(int8_t *data)

I get the compiler warning "converts between pointers to integer types with different sign" when passing a literal string to the function. ( i.e. when calling functionName("put this text in"); ).

Now, I understand why this happens and these lines are only debug however I wonder what people feel is the most appropriate way of handling this, short of typecasting every literal string. I don't feel that blanket typecasting in any safer in practice than using potentially ambiguous types like "char".

this
  • 5,229
  • 1
  • 22
  • 51
AndrewN
  • 41
  • 1
  • 3
  • 3
    If you want to stop using `char` then why are you writing string literals? I think it is pretty much pointless to attempt to pretend that `char` does not exist. You need to face up to it. – David Heffernan Jun 03 '14 at 10:41
  • 1
    What is the intented use of the data? If it is a text, use `char`, if numeric data, use `u/int8_t`. – user694733 Jun 03 '14 at 10:46
  • Thanks for the comments, The string literals are debug output like "Reached such and such a point" Those definitions do differ, here they are typedef'd (correctly in my opinion as: typedef signed char int8_t; typedef unsigned char uint8_t; Incidentally, using "signed char" instead of just "char" also creates the same compiler warning. – AndrewN Jun 03 '14 at 10:50
  • @user694733 If character data, use `char`; if numeric data, use `signed char` and if raw memory (or bit masks, or such), use `unsigned char`. Only very rarely, if ever, wouuld `int8_t` or `uint8_t` be appropriate. (For starters, not all systems support them.) – James Kanze Jun 03 '14 at 10:59
  • @JamesKanze It depends. I might use `u/int_least8_t` or `u/int_fast8_t` instead. – user694733 Jun 03 '14 at 11:03
  • @user694733 Why? What do they buy you over `signed char` and `unsigned char` (except added verbosity and confusion). For numeric values, the default type is `int`. Anytime you use anything else, there should be a good reason. I can think of an obvious good reason for `int_fast64_t`; the value might be large enough that it won't fit in an `int`. But that can't be the case for `int_fast8_t`. – James Kanze Jun 03 '14 at 11:23
  • @JamesKanze Consistency with other `int_*_t` I use. And `int_fast8_t` might use `short` instead of `int` if former has smaller size but same access time. I don't think it is any more verbose. – user694733 Jun 03 '14 at 11:38
  • 2
    @JamesKanze - Your comments are valid for software running on operating systems where each system has some x86 compatible processor. However the portability of code between embedded processors (think of us all moving from PICs at 8 bit to ARMs at 32 bit over the last five years). In this regime you have defined exactly how many bits you require at every declaration and you can maintain that by simply using an appropriate stdint.h file. – AndrewN Jun 03 '14 at 12:23

3 Answers3

3

You seem to be doing the wrong thing, here.

Characters are not defined by C as being 8-bit integers, so why would you ever choose to use int8_t or uint8_t to represent character data, unless you are working with UTF-8?

For C's string literals, their type is pointer to char, and that's not at all guaranteed to be 8-bit.

Also it's not defined if it's signed or unsigned, so just use const char * for string literals.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • I have seen `char` being used for UTF-8 as well. Works with regular C literals as long as source file is encoded as UTF-8. – user694733 Jun 03 '14 at 11:11
  • 2
    On embedded systems where you have to invent the token table yourself (for example when writing a program that types text on a graphical LCD) it often makes sense to use `uint8_t` for strings. This is because `uint8_t` is a sane and well-defined type, which is 100% portable. `char` however, is a rather insane type: it can be of any size and of any signedness, and is therefore completely non-portable. Where a deterministic program behavior is needed, you should use stdint.h, be it for string handling or for integers. – Lundin Jun 03 '14 at 11:23
  • @Lundin Then it (to me) still would make more sense to use `char`, and then investigate how much control the compiler gives you over the target encoding. You're going to depend on the encoding anyway, so it makes sense to make it explicit and controlled. – unwind Jun 03 '14 at 11:24
  • @unwind - Your post is quite right and the reason why it will never work. But Lundin's comments about embedded systems are the reason why it's a pain to us! I was hoping for an elegant solution but there doesn't seem to be one. – AndrewN Jun 03 '14 at 12:38
  • If the intent is to represent a character, that `char` should be used (Unicode and wide characters aside), if the intent is to represent a "*small integer*" that you might perform arithmetic on for example, or map to a hardware register width, then a stdint.h type makes sense. TMS320C55xx for example has no 8 bit addressable memory and a char is necessarily 16 bit. There are occasions when this will bite. Most compilers allow you to specify whether char is signed or unsigned with a command line switch - perhaps that is the solution here. – Clifford Jun 03 '14 at 16:26
3

To answer your addendum (the original question was nicely answered by @unwind). I think it mostly depends on the context. If you are working with text i.e. string literals you have to use const char* or char* because the compiler will convert the characters accordingly. Short of writing your own string implementation you are probably stuck with whatever the compiler provides to you. However, the moment you have to interact with someone/something outside of your CPU context e.g. network, serial, etc. you have to have control over the exact size (which I suppose is where your question stems from). In this case I would suggest writing functions to convert strings or any data-type for that matter to uint8_t buffers for serialized sending (or receiving).

const char* my_string = "foo bar!";
uint8_t buffer* = string2sendbuffer(my_string);
my_send(buffer, destination);

The string2buffer function would know everything there is to know about putting characters in a buffer. For example it might know that you have to encode each char into two buffer elements using big-endian byte ordering. This function is most certainly platform dependent but encapsulates all this platform dependence so you would gain a lot of flexibility. The same goes for every other complex data-type. For everything else (where the compiler does not have that strong an opinion) I would advise on using the (u)intX_t types provided by stdint.h (which should be portable).

merlin
  • 136
  • 5
  • Yep, you're right. And after reading the posts here and seeing that I am not doing anything wrong as such I will settle for typecasting to remove compiler warnings (hey, at least it's only in the debug on these systems). I think it's quite right to leave the two types as independent. Thanks for all your comments. – AndrewN Jun 03 '14 at 12:20
  • @AndrewN Your compiler probably has a command line switch to determine whether char is signed or unsigned - that may be a cleaner solution. – Clifford Jun 03 '14 at 16:28
1

It is implementation-defined whether the type char is signed or unsigned. It looks like you are using an environment where is it unsigned.

So, you can either use uint8_t or stick with char, whenever you are dealing with characters.

Lindydancer
  • 25,428
  • 4
  • 49
  • 68
  • Agreed, but the compiler doesn't like this. On an embedded system a "char" doesn't really exist. Oddly it is signed by default (I would have expected the other way). Since the UART where this is all spurted out of is indifferent about chars - it just puts out 8 bits as instructed. Therefore it isn't possible to split the two scenarios but the type checking is important for the rest of the system. – AndrewN Jun 03 '14 at 11:01