C++ standard library character conversion from utf-8 to cp1251 FreeBSD

Question

I try to convert sentence from utf-8 encoding into CP1251 in FreeBSD (in Windows it works fine). Conversion from utf-8 to German ISO8859-1 also works in FreeBSD, but to any Russian encoding does not. Here is the code:

#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
#include <vector>
int main(int argc, char* argv[]){
  // std::string g = "A-Za-zÄÖÜßäöüß";
  std::string g = "А-Яа-я";
  // const char* enc = "de_DE.ISO8859-1";
  const char* enc = "ru_RU.CP1251";
  std::locale loc(enc);
  std::cout << loc.name() << std::boolalpha << " has facet "
    << std::has_facet<std::ctype<wchar_t>>(loc) << '\n';
  std::wstring_convert<std::codecvt_utf8<wchar_t>> wconv;
  std::wstring wstr = wconv.from_bytes(g);
  std::vector<char> buf(wstr.size());
  std::use_facet<std::ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data()
    + wstr.size(), '#', buf.data());
  for(auto s: buf) { std::cout << s << '\n'; }
  return 1;
}

On my computer it creates locale successfully, but all Russian symbols become # (default).

Any thoughts will be appreciated.

I just tested on a FreeBSD-machine myself and got the same result. — Ted Lyngmo, Oct 24 '20 at 12:15
The standard library encoding conversion support is still pretty patchy, you're probably better off using a dedicated library like [iconv](https://www.gnu.org/software/libiconv/) or [icu](http://site.icu-project.org/home) — Alan Birtles, Oct 24 '20 at 12:35
Is it possible to set character encoding of Terminal (or what is used to output in Free BSD) to Cyrillic? It usually helps in Ubuntu, for example. — SChepurin, Oct 24 '20 at 12:42
@Alan Birtles in my case, as I see, the problem is in Russian encoding, because German works good. — Stepan Pavlov, Oct 24 '20 at 12:42
@SChepurin Already done, Russian symbols are beautifully displayed. — Stepan Pavlov, Oct 24 '20 at 12:43
yep, the BSD standard library possibly simply doesn't support conversion to CP1251, a dedicated library will give you more consistent results. It might just be that you need to install the CP1251 locale in BSD — Alan Birtles, Oct 24 '20 at 12:44
it works on Linux with: "std::setlocale(LC_ALL, "ru_RU.CP1251"); Otherwise -"run time error" even for German — SChepurin, Oct 24 '20 at 13:08
Checked once more - in online compiler it outputs German or Cyrillic string with or without "std::setlocale()". But in your code produces run time error here - std::locale* loc = new std::locale("...") — SChepurin, Oct 24 '20 at 13:18
Strange. Compiled in clang - it works for German, but not for Russian — SChepurin, Oct 24 '20 at 13:33
Installed gcc 9.3 compiler, it works the same - makes German, but not Russian. — Stepan Pavlov, Oct 25 '20 at 08:58
What do you see (in both cases) if you examine your compiled executable? For instance, if you run 'strings' on the binary, and if you run it under gdb / lldb, what do you see for 'g'? — Paul Floyd, Oct 27 '20 at 08:40
@Paul Floyd, I see it's unicode representation \xd0\x90-\xd0\xaf\xd0\xb0-\xd1\x87. As a matter of the fact, I've already replaced the procedure by boost's `from_utf` which works thoroughly. — Stepan Pavlov, Oct 27 '20 at 09:19

C++ standard library character conversion from utf-8 to cp1251 FreeBSD

0 Answers0