3

I want to store utf8 characters in my std::strings. For that I used boost::locale conversion routines.

In my first test everything works as expected:

#include <boost/locale.hpp>

std::string utf8_string = boost::locale::conv::to_utf<char>("Grüssen", "ISO-8859-15");
std::string normal_string = boost::locale::conv::from_utf(utf8_string, "ISO-8859-15");

The expected Result is:

utf8_string = "Grüssen"
normal_string = "Grüssen"

To get rid of passing "ISO-8859-15" as string I tried to use std::locale instead.

// Create system default locale
boost::locale::generator gen;
std::locale loc=gen("ISO8859-15"); 
std::locale::global(loc); 
// This is needed to prevent C library to
// convert strings to narrow 
// instead of C++ on some platforms
std::ios_base::sync_with_stdio(false);

std::string utf8_string = boost::locale::conv::to_utf<char>("Grüssen", std::locale());
std::string normal_string = boost::locale::conv::from_utf(utf8_string, std::locale());

But the result is not as expected:

utf8_string = "Gr|ssen"
normal_string = "Gr|ssen"

What's wrong with my use of using std::locale and generator? (Compiler VC2015, charset multibyte)

mtb
  • 1,350
  • 16
  • 32
Reine Elemente
  • 131
  • 1
  • 11
  • How do you inspect the results? It's weird to "expect" `utf8_string = "Grüssen"` since essentially you "expect" wrong decoding there. Also, what is the source file encoding? If it's anything else than latin1, it's wrong. – sehe Aug 22 '16 at 09:28
  • I inspected it with the VC2015 Debugger and i used win32 TextOutA to print the normal_string, that was back converted from utf8. Notepad++ tells me the file encoding is ANSI. Well, to see the utf_8 string Grüssen" is not weired, because "Grüsse" is the way the utf8-encoded Grüsse looks when you render it with something expecting iso8859-1. So what is wrong with the std::locale use here and why does the second version works? – Reine Elemente Aug 22 '16 at 17:08

1 Answers1

2

boost::locale::generator wants a locale id, not merely an encoding (the same encoding may be used by multiple locales). The scheme it uses is language_country.encoding, so you'll need de_DE.ISO-8859-15.

Also, you're playing with fire by putting non-ASCII characters within your source code. Be careful.

Your comment about sync_with_stdio() is also weird. It just makes sure buffers are flushed.

isanae
  • 3,253
  • 1
  • 22
  • 47
  • `sync_with_stdio()` makes sure buffers are /not/ unnecessarily flushed between C/C++ IO **and** removes requirements for thread synchronization. http://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio – sehe Aug 22 '16 at 21:33