2

I'm trying to convert a vector<wchar_t> to string (and then print it).

std::string(vector_.begin(), vector_.end());

This code works fine, except äöü ÄÖÜ ß. They will be converted to:

���

I also tried converting to wstring and printing with wcout, but I got the same issue.

Thanks in advance!

Liam K.
  • 75
  • 7
  • 1
    You need to use wstring instead. Or you need to encode it from unicode to ANSI. – KimKulling Dec 01 '22 at 07:59
  • As I said, I tried wstring, it has the same output – Liam K. Dec 01 '22 at 08:02
  • 1
    Is your terminal set to the appropriate character encoding? – molbdnilo Dec 01 '22 at 08:05
  • yes have the right encoding – Liam K. Dec 01 '22 at 08:09
  • @LiamK. What is that encoding? You cannot magically convert one encoding to another (at least not in C++). – john Dec 01 '22 at 08:10
  • vscode - UTF-8 without BOM. – Liam K. Dec 01 '22 at 08:12
  • @LiamK. Put another way, since you are using `wchar_t` your source encoding is presumably UTF-16 or UTF-32. What the encoding you expect to have when you have copied the characters to a `std::string`? Whatever that is, you have to do the work to make that conversion happen. – john Dec 01 '22 at 08:12
  • 1
    @LiamK. OK then you need to write a UTF-?? to UTF-8 conversion, or find some third party code to do that for you. Historically C++ has not been very good with Unicode (since it predates the widespread adoption of it). Depending on compiler and version you might be able to use `std::code_cvt` or you might use a third party library or you might write the conversion yourself. – john Dec 01 '22 at 08:15
  • I want to print the string using cout.. I googled and it seems like cout uses utf-8 – Liam K. Dec 01 '22 at 08:15
  • 1
    @LiamK. There is absolutely no guarantee that cout uses UTF-8. It might, it might not, it might depend on how your console/terminal is set up – john Dec 01 '22 at 08:16
  • @LiamK. `cout` doesn't use any particular encoding. It is totally unaware of encodings and only outputs the `char`s you give it. It's your responsibility to handle encodings. – molbdnilo Dec 01 '22 at 08:17
  • cout just passes the bytes you give it to the terminal so `cout` doesn't use utf-8 but your terminal might (unless you're on Windows) – Alan Birtles Dec 01 '22 at 08:17
  • Thanks for all the help, I figured it out. The converted string is UTF-16 and I needed UTF-8. – Liam K. Dec 01 '22 at 08:27
  • 1
    @Liam: If you got a minute, could you post your solution as an answer ? This might be of interest to others in the future. String encodings are a notorious beast. – nick Dec 01 '22 at 08:28
  • What you should use is `std::wstring` instead, since you deal with `wchar_t` instead of `char`. It may not necessarily solve your display issue but at least, you won't slice your `wchar_t` values anymore (because `wchar_t` is bigger than `char`). `std::wcout` may help as well – Fareanor Dec 01 '22 at 09:26
  • Yeah I will post, just a sec – Liam K. Dec 04 '22 at 15:35

1 Answers1

1

My Solution:

First I convert my vector<wchar_t> to an utf16string like this:

std::u16string(buffer_.begin(), buffer_.end());

Then I use this function, I found somewhere on here:

std::string IO::Interface::UTF16_To_UTF8(std::u16string const& str) {
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t, 0x10ffff,
        std::codecvt_mode::little_endian>, char16_t> cnv;
    std::string utf8 = cnv.to_bytes(str);
    if(cnv.converted() < str.size())
        throw std::runtime_error("incomplete conversion");
    return utf8;
}

I did not write the conversion function, but it works just as expected. With this I can successfully convert a vector<wchar_t> to string

Liam K.
  • 75
  • 7