0

I'm trying to process UTF-16 string (placed in a buffer buf) with the help of std::basic_string and istringstream. An exception std::bad_cast occurs in this code. Is there a problem with my code? Or gcc's STL just cannot handle unsigned int (16 bit) symbols?

const unsigned short * buf;
// ... fiilling buf
std::basic_string<unsigned short> w(buf);
std::basic_istringstream<unsigned short> iss(w);

unsigned int result;
try { iss >> result; }
catch (std::exception& e)
{
   const char * c = e.what();
}

The same code with std::wstring and std::wistringstream works correctly.

Violet Giraffe
  • 32,368
  • 48
  • 194
  • 335

1 Answers1

1

Instantiation of IOStreams on different character types than char and wchar_t is rather non-trivial. The streams need a number of std::locale facets to be present. Without them they won't function properly. For the attempted operation you'd need, at least:

  • std::ctype<cT>
  • std::numpunct<cT>
  • std::num_get<cT>

where cT is the stream's character type. The last one of these should just require instantiation but the others need to be implemented. Of course, you also need to make sure a std::locale is installed for the stream by either setting it up as the global locale or using stream.imbue().

Personally, I think this is overall the wrong approach, though: the characters should be converted into an internal representation when entering the system and converted to an external representation when leaving the system (that's the purpose of the std::codecvt<...> facet). It seems, however, that this is a lost fight and people feel they want to mess with encodings internally.

Violet Giraffe
  • 32,368
  • 48
  • 194
  • 335
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • Thanks, this explains. What if I have a valid wide string that just happens to be stored as an array of `unsigned short`? Can I somehow use `std::wstring` for processing it without manually converting array of shorts to array of `wchar_t`? The problem is simply that `wchar_t` is 4 bytes on my platform. – Violet Giraffe Oct 07 '13 at 13:19
  • Assuming your source data is UTF-16 encoded, I'd guess the approach would be to convert it into the internal `wchar_t` encoding. – Dietmar Kühl Oct 07 '13 at 13:25