I am writing a function that extracts unicode characters from a string one at a time. The argument is reference to a pointer to a char which the function increments to the next character prior to returning a value. Here is the entire function:
uint16_t get_char_and_inc(const char *&c) {
uint16_t val = *c++;
if ((val & 0xC0) == 0xC0)
while ((*c & 0xC0) == 0x80)
val = (val << 8) | *c++;
return val;
}
As many have pointed out, this UTF-8 decoder is not technically correct, it is limited to 16-bits codes and it does not remove the encoding bits, but it is sufficient for my limited graphics library for microcontrollers :)
The complexity of this function is irrelevant to the question, so assume it simply is this:
uint16_t get_utf8_char_and_inc(const char *&c) {
return *c++;
}
The problem I am having is that I would like it to work for both char *
and const char*
, i.e.:
void main() {
const char cc[] = "ab";
get_char_and_inc(cc);
printf(cc);
char c[] = "ab";
get_char_and_inc(c); // This does not compile
printf(c);
}
Expected output:
b
b
However, the second call gives me the error:
invalid initialization of non-const reference of type 'const char*&' from an rvalue of type 'const char*'
There are several questions on stackoverflow about this particular error message. Usually they regard passing a const char*
as a char *
, which is illegal. But in this case, I am going from a char *
to a const char*
. I feel like this should be legal as I am simply adding a guarantee not to modify the data in the function.
Reading through other answers, it appears the compiler makes a copy of the pointer, making it into a temporary r-value. I understand why this may be necessary in non-trivial conversions, but it seems like here it should not be necessary at all. In fact, if I drop the "&" from the function signature, it compiles just fine, but of course, then the pointers passed by value and the program prints "ab" instead of "b".
Currently, to make this work, I have to have the function twice, one taking const char *&c
and another taking char *&c
. This seems inefficient to me as the code is exactly the same. Is there any way to avoid the duplication?