We recently discovered a line of code that was doing the equivalent of
bool should_escape_control_char(char ch) {
return (ch < 0x20); // control chars are 0x00 through 0x1F
}
This works if plain char
is unsigned; but if plain char
is signed, then this filter accidentally catches negative chars as well. (The ultimate effect was that a naïve JSON encoder was encoding "é"
as "\u00c3\u00a9"
because to the encoder, it looked like a pair of negative chars which were then individually encoded.)
IMO, the original sin here is that we're comparing a plain char
expression against an integer, in such a way that the result depends on the signedness of char
. I wish the compiler had told us:
fantasy-warning: this comparison's result may depend on the signedness of plain char
return (ch < 0x20); // control chars are 0x00 through 0x1F
^~~~~~~~~
fantasy-note: cast the operand to silence this diagnostic
return (ch < 0x20); // control chars are 0x00 through 0x1F
~~
(signed char)(ch)
I was surprised to discover that Clang gives no option to warn in this situation; and I don't see any warning option in GCC either.
- Am I just not looking in the right place?
- What tools / linters / static-analyzers exist that do warn in this situation?