7

Is there an easy way to convert between char and unsigned char if you don't know the default setting of the machine your code is running on? (On most architectures, char is signed by default and thus has a range from -128 to +127. On some other architectures, such as ARM, char is unsigned by default and has a range from 0 to 255) I am looking for a method to select the correct signedness or to convert between the two transparently, preferably one that doesn't involve too many steps since I would need to do this for all elements in an array.

Using a pre-processor definition would allow the setting of this at the start of my code.
As would specifying an explicit form of char such as signed char or unsigned char as only char is variable between platforms.

The reason is, there are library functions I would like to use (such as strtol) that take in char as an argument but not unsigned char.

I am looking for some advice or perhaps some pointers in the right direction as to what would be a practical efficient way to do this to make the code portable, as I intend to run the code on a few machines with different default settings for char.

  • 3
    Please post a single question per question, I think in its current form it is impossible to answer this. – Benjamin Bannier Nov 21 '13 at 13:57
  • 1
    Welcome to StackOverflow! Please note that a question on StackOverflow must be answerable by concrete references, not opinions - for this reason, a question of "should I be doing X" or "what is a good resource for Y" are to be avoided. Also note that one question should cover one concrete problem. – user4815162342 Nov 21 '13 at 13:59
  • Sorry, I didn't want to be labelled as spam if I all of a sudden ask 6 questions. :( Also I'd ask in chat but I don't have the reputation for it. –  Nov 21 '13 at 14:01
  • You do realize both your loops do nothing, right? – SigTerm Nov 21 '13 at 14:01
  • @SigTerm Yes I realise that, I didn't put in the rest of my for loop. –  Nov 21 '13 at 14:03
  • @ReilaLee As for question 2: it's a "foreach" (or "range-based") loop (C++11). I think that compiler highly optimizes it, so there won't be much difference. You can do tests on your own though. – freakish Nov 21 '13 at 14:04
  • @BenjaminBannier I edited my question, please take off the "hold" so it can be answered. Also as I am new I could only ask 1 question every 20 minutes, which is kind of ridiculous as I would need to wait 2 hours to ask my 6 questions. :/ –  Nov 21 '13 at 14:08
  • This should help you http://stackoverflow.com/questions/5040920/converting-from-signed-char-to-unsigned-char-and-back-again – Benjamin Trent Nov 21 '13 at 14:29
  • 1
    There are 3 types of `char` (`char`, `unsigned char` and `signed char`)in c++, so this is actually quite a good portability issue question. @reila-lee start a chat with me and I will post an answer that covers all the questions you have as best I can:) – GMasucci Nov 21 '13 at 14:29
  • Some compilers allow you to specify the underlying type of `char`, whether it be `signed` or `unsigned`. Check your compiler's documentation. – Thomas Matthews Nov 21 '13 at 14:30
  • @GMasucci Thanks for the edit, I know all the concepts but didn't remember much of the terms (stuff like "architecture" and "pre-processor"). It definitely helped clarify it. Unfortunately I don't think I have access to chat yet. :/ –  Nov 21 '13 at 14:33
  • no problem, just use the comments section and I will do my best – GMasucci Nov 21 '13 at 14:34
  • @ThomasMatthews I'm not interested in specifying my own compiler's type, I would like to know if I could specify the char type of the compiler my code will be running on, from my code. Or if this would be a bad idea if it can invoke unexpected results from other library functions I'm using. –  Nov 21 '13 at 14:35
  • @reila-lee you can always use the exact type you want, for example if you only want `signed char` make all your code use that, then the compiler definition of `char` becomes irrelevant as you are always calling the type you want. Makes it more longwinded but more portable, or you can do something like `#ifdef __arm__` `#undef char #define char signed char` (not sure if that works off the top of my head but not a million miles off) which redefined the type `char` to what you want – GMasucci Nov 21 '13 at 14:43
  • 2
    there is a list of predefined macros at http://sourceforge.net/p/predef/wiki/Architectures/ which may help, that way you can detect which architecture its being run on and by that determine what `char` defaults to. – GMasucci Nov 21 '13 at 14:44
  • @ReilaLee you can't change the compiler that way, but you can create/use types which are what you want, you might have int8_t and uint8_t on your platform. Otherwise create something like that yourself. – wimh Nov 21 '13 at 14:47
  • @bwtrent I know I could always initiate an "if" statement to check my char array and convert it manually using modular arithmetic, but this might bring up some problems. Firstly if it happens that all values of the array are within the 0-127 range then I have no way of knowing how the machine stores it. Secondly this seems like a lot more work for what should be an easy conversion. I'm hoping for the best case scenario that I would be allowed to just define "char" as "unsigned char" with no consequences. –  Nov 21 '13 at 14:50
  • @GMasucci I was doing just that, I had initiated everything as "unsigned char" in my code but some library functions (like "strtol") force me to use just plain "char", so I was stuck converting back and forth. –  Nov 21 '13 at 14:53
  • 1
    On a two's complement machine, its just a matter of interpretation. And thus, for 7bit char sets, there's no difference at all. – Sam Nov 21 '13 at 14:53
  • @Wimmel Would it be sensible/feasible to create a type that stores as "unsigned char" but can also be passed as "char"? (Unless I'd be forced to allocate twice the memory or to convert it whenever I need it as "char", this might be useful.) –  Nov 21 '13 at 14:56
  • @Sam I'm going to be using the "unsigned char" in mathematical operations (mod 256) so I'd like to avoid having to figure out if there are going to be errors if the machine only reads up to 127. Anyway thanks a lot to everyone for all the responses, I've got to go home and get some sleep, I'll check back in the morning. –  Nov 21 '13 at 15:02
  • @Reila Lee : I understood. `static_assert(,"Two's complement platform required");` and then just do implicit or explicit (reinterpret) casts. After all, you'd just need to be cautious about function overloads or certain template specializations. http://stackoverflow.com/q/5040920/1175253 – Sam Nov 21 '13 at 15:05
  • @Sam I didn't understand what most of the stuff you said means (I've never used `static_assert` or implicit/explicit casts. I think I have a lot more to learn. –  Nov 22 '13 at 14:03
  • @Reila Lee : Didier Trosset explained it. That static assert is just to make sure, that the code won't compile for platforms, where reinterpretation of integers (chars) in memory leads to errors. Regarding the casts: It is just about whether you manually convert the pointer type among the three char types or if it happens automatically, perhaps with a compiler warning. Some undesired things could happen, if template classes or functions have specializations or overloads respectively, which have different behavior for different char type. – Sam Nov 22 '13 at 14:29

2 Answers2

2

I don't feel any actual issue on this point.

It's not a matter of the architecture being signed or unsigned by default. It's rather a matter of the compiler, and the default setting can be changed between the two options as you wish.

Also, there's no need to convert between the types. Both have the same representation in memory, on the same number of bits (usually 8). It's only a matter of your program and the libraries it uses to interpret the bits. If you're going to call strtol, then your data is a character array and you ought to use plain char.

If you ever use char to store not a character (A, b, f ...) but an actual value (-1, 0, 42 ...) then the range matters. In such cases, you have to use signed char or unsigned char. However in such a case, there's little use for the libraries functions that want a char *.

For these libraries that do actually want a char * with an actual binary blob, there's no issue. Create your binary buffer with the type you prefer, signed, unsigned, or undecided, and send it, possibly with a cast. It will run perfectly.

Didier Trosset
  • 36,376
  • 13
  • 83
  • 122
  • Exactly correct. The only issue that could occur is if you convert an `unsigned char *` to a `signed char *` or vice-versa, and have values less than 0/greater than 127, AND are on a system not using 2's complement representation. But you shouldn't ever run into this situation. If you are passing a string into a library function like `strtol`, you shouldn't have any characters outside of the ASCII range in the first place! – Taylor Brandstetter Nov 21 '13 at 15:13
  • I am using "unsigned char" to store byte-sized values and perform modular arithmetic with them. As of right now my program accepts user input of a type "char" array of a string of valid hex characters (eg, AC24F5). I then use "strtol" to read the the string 2 characters at a time as a hexadecimal representation of a byte and store the resulting value as type "unsigned char". One reason I would like to standardise the usage of "char" is because I could then directly input an "unsigned char" array into the program of half the size and bypass the process of reading and re-storing. –  Nov 22 '13 at 09:24
1

C++ has three char types however only char is allowed to vary between compilers/architectures, as the other two are explicit version, and char is implicit, so it is allowed to default to signed or unsigned.

To make your code portable the most straightforward thing to do is explicitly to use either signed or unsigned char as you require them, however for readability you may prefer to redefine char as a the type you need, or even make your own definition of a char (for demonstration purposes I will use RLChar)

1st version - un-define char and redefine

#ifdef __arm__
#undef char
#define char signed char
#endif

2nd version - define your own custom char type to use in your code

#ifndef RLChar
#define RLChar signed char
#endif

(personally I would tend to do the second)

You can also create another macro to allow changes between the two:

#define CLAMP_VALUE_TO_255(v) ((v) > 255 ? 255 : ((v) < 0 ? 0 : (v)))

then you can use:

unsigned char clampedChar = CLAMP_VALUE_TO_255((unsigned char)pixel)

or use casts such as (these are the way to go if all the compilers you will use have the support for it):

signed char myChar = -100;
unsigned char mySecondChar;
mySecondChar = static_cast<unsigned char>(myChar); // uses a static cast 
mySecondChar = reinterpret_cast<unsigned char&>(myChar); // uses a reinterpretation cast

so for your array scenario you could do

unsigned char* RLArray;
RLArray = reinterpret_cast<unsigned char*>(originalSignedCharArray); 

Let me know if you need more info as this is just what I can remember off the top of my head, especially if you need C equivalents or more details. :)

GMasucci
  • 2,834
  • 22
  • 42
  • I've been defining type "byte" as "unsigned char". I would like to know if redefining "char" as "unsigned char" could cause me trouble later on, such as in library functions? I'm just using standard libraries like "iostream", "cmath", and "cstdlib". –  Nov 22 '13 at 09:28
  • none at all:) effectively char is just an alias for whichever one of `signed` or `unsigned` char you choose/compiler makers choose. That's why it can vary across platforms:) – GMasucci Nov 22 '13 at 09:42