2

I've been working with C++, and have used std library; specially, the method string::substr.

I have detected some errors in this method that I want to tell you.

  1. For a string abcñ, a call to substr(0, 4) returns abc?.
  2. For a string abcç, a call to substr(0, 4) returns abc?.
  3. For a string abcñd, a call to substr(0, 5) returns abcñ.
  4. For a string abcçd, a call to substr(0, 5) returns abcç.

I have noticed with these tests, that strange characters (such as ñ or ç), take up a double size. But, shouldn't string::substr keep this fact in mind, or work with different codifications? In the API, there is no method to work with different codifications.

Santiago Gil
  • 1,292
  • 7
  • 21
  • 52
  • 2
    are you aware of std::string is std::basic_string and std::wstring is std::basic_string? Also this link http://en.cppreference.com/w/cpp/string/basic_string makes it clear that substr is member of std::basic_string class and not std::string – Alessandro Teruzzi Apr 22 '16 at 12:23
  • Then should I work with wstrings in case of the use of these strange characters? – Santiago Gil Apr 22 '16 at 12:27
  • 4
    std::string is just an array of bytes. It is not an array of Unicode characters. Find some documentation about encodings, especially UTF-8. – ZunTzu Apr 22 '16 at 12:30
  • 3
    yes, you should use std::wstring and make sure you tell the compiler that your literal is made by wide chars. – Alessandro Teruzzi Apr 22 '16 at 12:34
  • 1
    On the contrary, I recommend you use std::string and UTF-8, because UTF-8 is simpler and more efficient than UCS-2 ([read this](http://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/)). Just be careful NOT to use functions such as substr because those functions are not character-wise, they are byte-wise. You have to use Unicode-aware functions. – ZunTzu Apr 22 '16 at 13:37

0 Answers0