-1

I naively added an int to a wchar_t resulting in a Visual Studio 2013 warning.

L'A' + 1 // next letter

warning C4244: 'argument' : conversion from 'int' to 'wchar_t', possible loss of data

So the error is concerned that a 4 byte int is being implicitly cast to a 2 byte wchar_t. Fair enough.

What is the C++ 11 standards safe way of doing this? I'm wondering about cross-platform implications, code-point correctness and readability of doing things like: L'A' + (wchar_t)1 or L'A' + \U1 or whatever. What are my coding options?

Edit T+2: I presented this question to a hacker's group. Unsurprisingly, no one got it correct. Everyone agreed this is a great interview question when hiring C/C++ Unicode programmers because it's very terse and deserves a meaty conversation.

BSalita
  • 8,420
  • 10
  • 51
  • 68
  • 1
    ... by any chance are you then storing the result in a `wchar_t`? Please post http://sscce.org as `int main() { L'A'+1; }` won't reproduce your warning. – Yakk - Adam Nevraumont May 21 '14 at 17:24
  • No chance at all. Temp expression. – BSalita May 21 '14 at 17:25
  • `static_cast(L'A' + 1)` – Igor Tandetnik May 21 '14 at 17:39
  • @Igor but the OP says they are not converting the result of the expression to a `wchar_t`. If so, why would converting it to a `wchar_t` explicitly help? – Yakk - Adam Nevraumont May 21 '14 at 17:40
  • @Yakk: I'm pretty sure the OP is then passing the result of the expression to a function declared to take `wchar_t`, as in `f(L'A' + 1)`. Hence 'argument' in the warning message. – Igor Tandetnik May 21 '14 at 17:42
  • @IgorTandetnik Hey, I asked him right out if he's storing it as a `wchar_t`. Maybe function arguments aren't "storage" as far as the OP is concerned? – Yakk - Adam Nevraumont May 21 '14 at 18:17
  • @igor is correct. The temp expression is used as an argument to a func similar to his example. Please note, this is not a question of Why -- i stated why. The question is of how to-do or correct practice. I also show two solutions but there may be better solutions. – BSalita May 21 '14 at 18:52
  • Have you tried `L'A' + wchar_t(1)`? – Biffen May 21 '14 at 19:44
  • @Biffen, while L'A' + wchar_t(1) works, it would not work for L'A' + wchar_t(-1) where wchar_t is unsigned and sizeof(int) > sizeof(wchar_t). I didn't know wchar_t(1) was valid syntax -- learned something today, thanks. – BSalita May 21 '14 at 22:41
  • @BSalita It's an oft-forgotten syntax, sort of a constructor. How about `wchar_t('A' - 1)`? – Biffen May 22 '14 at 06:32

2 Answers2

1

When you add two integral values together, such that both values can fit within an int, they are added as ints.

If you require an unsigned int to fit one of them, they are instead added as unsigned ints.

If those are not big enough, bigger types can be used. It gets complicated, and it changes by standard revision if I remember rightly (there where some rough spots).

Now, addition with ints is unspecified if it overflows. Addition with unsigned ints is guaranteed to wrap mod some power of two.

When you convert an int or an unsigned int to a signed type, if it doesn't fit the result is unspecified. If it does fit, it fits.

If you convert an int or unsigned int to an unsigned type, the value that can be represented equal to the source mod some power of two (fixed for the given unsigned type) is the result.

Many popular C++ compilers and hardware return the same bit pattern for int as they would for unsigned int interpreted by 2s complement logic, but that is not required by the standard.

So L'A' + 1 involves converting L'A' to an int, adding 1 as an int.

If we add the missing bit:

wchar_t bob = L'A' + 1;

we can see where the warning occurs. The compiler sees someone converting an int to a wchar_t and warns them. (this makes more sense when the values in question are not compile time constants)

If we make it explicit:

wchar_t bob = static_cast<wchar_t>(L'A' + 1);

the warning (probably? hopefully?) goes away. So long as the right hand side results in being in the range of valid wchar_t values, you are golden.

If instead you are doing:

wchar_t bob = static_cast<wchar_t>(L'A' + x);

where x is an int, if wchar_t is signed you could be in trouble (unspecified result if x is large enough!), and if it unsigned you could still be somewhat surprised.

A nice thing about this static_cast method is that unlike (wchar_t)x or wchar_t(x) casts, it won't work if you accidentally feed pointers into the cast.

Note that casting x or 1 is relatively pointless, unless it quiets the compiler, as the values are always up-converted (logically) into ints prior to + operating (or unsigned ints if wchar_t is unsigned and the same size as an int). With int significantly larger than wchar_t this is relatively harmless if wchar_t is unsigned, as the back-conversion is guaranteed to do the same thing as adding in wchar_t mod its power of two, and if wchar_t is signed leaving the gamut gives an unspecified result anyhow.

So, cast the result using static_cast. If that doesn't work, use a bitmask to explicitly clear bits you won't care about.

Finally, VS2013 uses 2s complement math for int. So static_cast<wchar_t>(L'A' + x) and static_cast<wchar_t>( L'A' + static_cast<wchar_t>(x)) always produce the same values, and would do so if wchar_t was replaced with unsigned short or signed short.

This is a poor answer: it needs curation and culling. But I'm tired, and it might be illuminating.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • We need more eyes on your answer. I don't think you are correct for Visual Studio for `(L'A' + x)` where `x` is > SHORT_MAX. As you say "it needs curation and culling". – BSalita May 22 '14 at 21:25
  • @bslita `(a+(b%k))%k`=`(a+b)%k`. And vs does sign extension on signed values, but otherwise treats them the same as unsigned, which is math mod `2^(8*sizeof(T))` for type `T`. Can you produce a single counter example? I could be wrong. – Yakk - Adam Nevraumont May 22 '14 at 22:57
  • `static_cast( L'A' + static_cast(x))` won't work. Equivalent is `(short)(L'A' + (short)0xffff0000)`. – BSalita May 23 '14 at 07:38
0

Until I see a more elegant answer, which I hope there is, I'll go with this pattern:

(wchar_t)(L'A' + i)

I like this pattern because i can be negative or positive and it will evaluate as expected. My original notion to use L'A' + (wchar_t)i is flawed if i is negative and wchar_t is unsigned. I'm assuming here that wchar_t is implementation dependent and could be signed.

BSalita
  • 8,420
  • 10
  • 51
  • 68
  • Whether `wchar_t` is signed or unsigned is implementation-defined. But even if it is unsigned, `L'A' + (wchar_t)(-1)` is still the wide character before `L'A'`. – aschepler May 21 '14 at 22:35
  • @aschepter, not if `wchar_t(-1)` is unsigned and `sizeof(int) > sizeof(wchar_t)`. Right? Hmmm, I guess I'm not sure if the result of the addition may be an int or must remain wchar_t. – BSalita May 21 '14 at 22:49