0

I currently have a std::string and it contains this

"\xa9 2006 FooWorld"

Basically it contains the symbol © . This string is being passed to a method to an external API that takes in UTF-8. How could I make this string UTF-8 compatible ? Any suggestions. I read here I could use std::wstring_convert but I am not sure how to apply it in my case. Any suggestions would be appreciated.

MistyD
  • 16,373
  • 40
  • 138
  • 240
  • For that one character it's probably not worth anything complicated. Just hardcode the utf-8 equivalent. http://www.utf8-chartable.de/ – Retired Ninja Apr 05 '18 at 00:24
  • The thing is it could be multiple characters – MistyD Apr 05 '18 at 00:24
  • You should probably have that in the question. :) Personally, I'd use this: http://utfcpp.sourceforge.net/ – Retired Ninja Apr 05 '18 at 00:26
  • 1
    `std::string` stores bytes, not characters. So if you do not know the original encoding, there's no way guaranteed to work. If you know the original encoding is utf8, then you do not need anything extra, because, again, `std::string` stores the encoding bytes. – xskxzr Apr 05 '18 at 05:58
  • maybe you want to read [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – phuclv Apr 06 '18 at 01:20

2 Answers2

1

That's simple: use a UTF-8 string literal:

u8"\u00A9 2006 FooWorld"

That will result in a const char[] that is a properly encoded UTF-8 string.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • for instance if I have ``std::basic_string str = "\xa9 2006 FooWorld"`` How do I append u8 to it ? – MistyD Apr 05 '18 at 01:36
  • @MistyD: You change the code to read: `std::string str = u8"\u00A9 2006 FooWorld"`. If you're not allowed to change the literal itself, then this is a duplicate as previously outlined. – Nicol Bolas Apr 05 '18 at 02:12
  • When using any of the Unicode-aware literal prefixes, you can use the actual Unicode character instead of using its codepoint/codeunits manually, eg: `u8"© 2006 FooWorld"`. Let the compiler do the work for you. – Remy Lebeau Apr 06 '18 at 01:22
0

In C++11 and later, the best way to get a UTF-8 encoded string literal is to use the u8 prefix:

std:string str = u8"\u00A9 2006 FooWorld";

or:

std:string str = u8"© 2006 FooWorld";

However, you can use std::wstring_convert, too (especially if your input data is not a string literal):

#include <codecvt>
#include <locale>
#include <string>

std::wstring wstr = L"© 2006 FooWorld"; // or whatever...

std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;

std::string str = convert.to_bytes(wstr);
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770