2

Problem Description

I'm using Expat with a custom C++ wrapper, which I already tested on other projects. I'm running into problems, because the original data (c_str) is not converted to a std::string in the right way. This concers me, because I did not change the source of the wrapper.

It seems like the string gets null-terminated chars after this conversion:

onCharacterData( std::string( pszData, nLength ) ) // --> std::string( char* pszData)

How can I fix this?

Own expat wrapper

// Wrapper defines the class Expat and implements for example:
void XMLCALL Expat::CharacterDataHandler( void *pUserData, const XML_Char *pszData,
                                          int nLength )
{
  Expat* pThis = static_cast<Expat*>( pUserData );

  // XML_Char is char, therefore this call contains i.e.: std::string("hello", 5) 
  pThis->onCharacterData( std::string( pszData, nLength ) );
}

Custom parser

// Parser is defined as: class Parser : Expat
void Parser::onCharacterData(const std::string& data )
{
  // data is no longer char*, but a std::string.
  // It seems to contain \0 after each character which is wrong!

  // [...]
}

Character data within the expat wrapper (char*)

Character data within the expat wrapper (char*)

Character data within the parser (std::string)

Character data within the parser (std::string)

skaffman
  • 398,947
  • 96
  • 818
  • 769
Smamatti
  • 3,901
  • 3
  • 32
  • 43

3 Answers3

5

Your pszData appears to be in some implementation-specific Unicode-derived format, where each "character" takes up two chars.

This means the source data is broken; it should have been a wchar_t buffer, perhaps.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • This would explain the result in the screenshot, but I'm afraid this is not hte case here. The active definition is: `typedef char XML_Char;` Therefore std::string( char* ) should work. – Smamatti Jul 21 '11 at 13:32
  • @Smamatti: That says nothing about how the buffer was populated. Someone has taken a `char` buffer, and filled it with data that should not be in a `char` buffer. You've said elsewhere that UTF-8 encoding is in effect; I think that's your answer. The fact is, the `std::string` contains the same representation as your source data, so there is no error here. – Lightness Races in Orbit Jul 21 '11 at 14:24
  • I was using _libexpatwMT.dll_ instead of _libexpatMT.dll_. Your initial idea of UTF-16 encoded strings was correct. Changing the Headers/macros was not changing anything of course. Thanks for bumping me into the right direction. – Smamatti Jul 22 '11 at 13:41
2

It looks like the expat is using wide chars and/or UTF-16. Try using std::wstring on a way back.

EDIT I found in docs that it is using wchar_t if XML_UNICODE or XML_UNICODE_WCHAR_T macro are defined.

Maciej Piechotka
  • 7,028
  • 6
  • 39
  • 61
  • It is true, that these macros determine the typedef of _XML_CHAR_, but unfortunately the macros are not set and Visual Studio confirms, that _XML_CHAR_ is of the type _char_ and UTF-8 encoded. – Smamatti Jul 21 '11 at 13:29
0

As others have pointed out it appears pszData is a multibyte character string. You should try using std::basic_string<XML_Char> in place of std::string or std::wstring. Use a typedef if that seems too verbose.

Of course, if XML_Char is neither a char nor a wchar_t you might have to provide a template specialization for std::char_traits

EDIT:
Some googling revealed that XML_Char is UTF-8; the library can be made to use UTF-16 if you define XML_UNICODE or XML_UNICODE_WCHAR_T.

Praetorian
  • 106,671
  • 19
  • 240
  • 328