3

I try to convert sentence from utf-8 encoding into CP1251 in FreeBSD (in Windows it works fine). Conversion from utf-8 to German ISO8859-1 also works in FreeBSD, but to any Russian encoding does not. Here is the code:

#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
#include <vector>
int main(int argc, char* argv[]){
  // std::string g = "A-Za-zÄÖÜßäöüß";
  std::string g = "А-Яа-я";
  // const char* enc = "de_DE.ISO8859-1";
  const char* enc = "ru_RU.CP1251";
  std::locale loc(enc);
  std::cout << loc.name() << std::boolalpha << " has facet "
    << std::has_facet<std::ctype<wchar_t>>(loc) << '\n';
  std::wstring_convert<std::codecvt_utf8<wchar_t>> wconv;
  std::wstring wstr = wconv.from_bytes(g);
  std::vector<char> buf(wstr.size());
  std::use_facet<std::ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data()
    + wstr.size(), '#', buf.data());
  for(auto s: buf) { std::cout << s << '\n'; }
  return 1;
}

On my computer it creates locale successfully, but all Russian symbols become # (default).

Any thoughts will be appreciated.

Stepan Pavlov
  • 119
  • 2
  • 3
  • 7
  • 1
    I just tested on a FreeBSD-machine myself and got the same result. – Ted Lyngmo Oct 24 '20 at 12:15
  • The standard library encoding conversion support is still pretty patchy, you're probably better off using a dedicated library like [iconv](https://www.gnu.org/software/libiconv/) or [icu](http://site.icu-project.org/home) – Alan Birtles Oct 24 '20 at 12:35
  • Is it possible to set character encoding of Terminal (or what is used to output in Free BSD) to Cyrillic? It usually helps in Ubuntu, for example. – SChepurin Oct 24 '20 at 12:42
  • @Alan Birtles in my case, as I see, the problem is in Russian encoding, because German works good. – Stepan Pavlov Oct 24 '20 at 12:42
  • @SChepurin Already done, Russian symbols are beautifully displayed. – Stepan Pavlov Oct 24 '20 at 12:43
  • yep, the BSD standard library possibly simply doesn't support conversion to CP1251, a dedicated library will give you more consistent results. It might just be that you need to install the CP1251 locale in BSD – Alan Birtles Oct 24 '20 at 12:44
  • it works on Linux with: "std::setlocale(LC_ALL, "ru_RU.CP1251"); Otherwise -"run time error" even for German – SChepurin Oct 24 '20 at 13:08
  • @SChepurin on FreeBSD it doesn't. – Stepan Pavlov Oct 24 '20 at 13:11
  • Checked once more - in online compiler it outputs German or Cyrillic string with or without "std::setlocale()". But in your code produces run time error here - std::locale* loc = new std::locale("...") – SChepurin Oct 24 '20 at 13:18
  • @SChepurin clang version 8.0.1 doesn't produce any errors. – Stepan Pavlov Oct 24 '20 at 13:25
  • 1
    Strange. Compiled in clang - it works for German, but not for Russian – SChepurin Oct 24 '20 at 13:33
  • Installed gcc 9.3 compiler, it works the same - makes German, but not Russian. – Stepan Pavlov Oct 25 '20 at 08:58
  • What do you see (in both cases) if you examine your compiled executable? For instance, if you run 'strings' on the binary, and if you run it under gdb / lldb, what do you see for 'g'? – Paul Floyd Oct 27 '20 at 08:40
  • 2
    @Paul Floyd, I see it's unicode representation \xd0\x90-\xd0\xaf\xd0\xb0-\xd1\x87. As a matter of the fact, I've already replaced the procedure by boost's `from_utf` which works thoroughly. – Stepan Pavlov Oct 27 '20 at 09:19

0 Answers0