0

I've integrated Hunspell in an unmanaged C++ app on Windows 7 using Visual Studio 2010.

I've got spell checking and suggestions working for English, but now I'm trying to get things working for Spanish and hitting some snags. Whenever I get suggestions for Spanish the suggestions with accent characters are not translating properly to std::wstring objects.

Here is an example of a suggestion that comes back from the Hunspell->suggest method:

Hunspell->suggest(...) result

Here is the code I'm using to translate that std::string to a std::wstring

std::wstring StringToWString(const std::string& str)
{
    std::wstring convertedString;
    int requiredSize = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, 0, 0);
    if(requiredSize > 0)
    {
        std::vector<wchar_t> buffer(requiredSize);
        MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, &buffer[0], requiredSize);
        convertedString.assign(buffer.begin(), buffer.end() - 1);
    }

    return convertedString;
}

And after I run that through I get this, with the funky character on the end.

After conversion to wstring

Can anyone help me figure out what could be going on with the conversion here? I have a guess that it's related to the negative char returned from hunspell, but don't know how I can convert that to something for the std::wstring conversion code.

Jacob
  • 3,629
  • 3
  • 36
  • 44

2 Answers2

1

It looks like the output of Hunspell is ASCII with code page 852. Use 852 instead of CP_UTF8 http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx

Or configure Hunspell to return UTF8.

Markus Schumann
  • 7,636
  • 1
  • 21
  • 27
1

It looks like the output of Hunspell is ASCII with code page 28591 (ISO 8859-1 Latin 1; Western European (ISO)) which I found by looking at the Hunspell default settings for the unix command line utility.

Changing the CP_UTF8 to 28591 worked for me.

// Updated code page to 28591 from CP_UTF8
std::wstring StringToWString(const std::string& str)
{
    std::wstring convertedString;
    int requiredSize = MultiByteToWideChar(28591, 0, str.c_str(), -1, 0, 0);
    if(requiredSize > 0)
    {
        std::vector<wchar_t> buffer(requiredSize);
        MultiByteToWideChar(28591, 0, str.c_str(), -1, &buffer[0], requiredSize);
        convertedString.assign(buffer.begin(), buffer.end() - 1);
    }

    return convertedString;
}

Here is a list of code pages from MSDN that helped me find the correct code page integer.

Jacob
  • 3,629
  • 3
  • 36
  • 44