C++. Bug with std::string::substr?

Question

I've been working with C++, and have used std library; specially, the method string::substr.

I have detected some errors in this method that I want to tell you.

For a string abcñ, a call to substr(0, 4) returns abc?.
For a string abcç, a call to substr(0, 4) returns abc?.
For a string abcñd, a call to substr(0, 5) returns abcñ.
For a string abcçd, a call to substr(0, 5) returns abcç.

I have noticed with these tests, that strange characters (such as ñ or ç), take up a double size. But, shouldn't string::substr keep this fact in mind, or work with different codifications? In the API, there is no method to work with different codifications.

are you aware of std::string is std::basic_string and std::wstring is std::basic_string? Also this link http://en.cppreference.com/w/cpp/string/basic_string makes it clear that substr is member of std::basic_string class and not std::string — Alessandro Teruzzi, Apr 22 '16 at 12:23
Then should I work with wstrings in case of the use of these strange characters? — Santiago Gil, Apr 22 '16 at 12:27
std::string is just an array of bytes. It is not an array of Unicode characters. Find some documentation about encodings, especially UTF-8. — ZunTzu, Apr 22 '16 at 12:30
yes, you should use std::wstring and make sure you tell the compiler that your literal is made by wide chars. — Alessandro Teruzzi, Apr 22 '16 at 12:34
On the contrary, I recommend you use std::string and UTF-8, because UTF-8 is simpler and more efficient than UCS-2 ([read this](http://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/)). Just be careful NOT to use functions such as substr because those functions are not character-wise, they are byte-wise. You have to use Unicode-aware functions. — ZunTzu, Apr 22 '16 at 13:37

C++. Bug with std::string::substr?

0 Answers0