0

First time posting here, so apologies in advance if my Title / formatting / tags are not how they are supposed to be.

I am trying to create a function in a c++ windows console application, which will remove diacritics from an std::wstring user input. To do so, I'm using a code created with help from this question as well as converting my wstring to an UTF-8 string as follows:

std::string test= wstring_to_utf8 (input);

std::string wstring_to_utf8 (const std::wstring& str){
 std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
 return myconv.to_bytes(str);
}

std::string output= desaxUTF8(test);

with desaxUTF8(...) being:

#include <unicode/utypes.h>
#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/translit.h>
#include <unicode/stringpiece.h>

std::string desaxUTF8(const std::string& str) {

StringPiece s(str);
UnicodeString source = UnicodeString::fromUTF8(s);
//...
return result;
}

Here is where i run into a problem. The StringPiece s does not properly receive value from the string str, but instead gets set to an incorrect value.

But if i were to replace StringPiece s(str); with a hard coded value, say StringPiece s("abcš");, it works perfectly fine.

Using the VS2015 debugger, the value on StringPiece s for an user input abcš is an incorrect 0x0028cdc0 "H\t„", while the value for a hard coded abcš is the correct 0x00b483d4 "abcš"

What am i doing wrong, and what is the best way to fix this? I have already tried the recommended solutions from this thread.

I've spent the last two days trying to find a solution to no avail, so any help would be greatly appreciated.

Thank you in advance.

Post answer EDIT: For anyone that is interested, here is the working code, with massive thanks to Steven R. Loomis for making it happen;

std::wstring Menu::removeDiacritis(const std::wstring &input) {

UnicodeString source(FALSE, input.data(), input.length());
UErrorCode status = U_ZERO_ERROR;
    Transliterator *accentsConverter = Transliterator::createInstance(
    "NFD; [:M:] Remove; NFC", UTRANS_FORWARD, status);
accentsConverter->transliterate(source);

std::wstring output(source.getBuffer(), source.length());
return output;
}
Community
  • 1
  • 1
Peter
  • 3
  • 3
  • 1
    What are you trying to achieve with StringPiece directly in the mix? UnicodeString u = UnicodeString::fromUTF8(str) should work just fine assuming str is std::string containing valid UTF-8. – NuSkooler Jan 12 '16 at 17:36
  • I tried what you recommended, it yields the same incorrect behavior. Although, UnicodeString u = UnicodeString::fromUTF8("abcš") does work, so it seems that StringPiece really is unnecessary. It doesn't solve my problem, however, as it still doesn't use the correct string str value in the UnicodeString. – Peter Jan 12 '16 at 18:59
  • 1
    I think at this point we know the data coming from wstring_to_utf8() must be bad. What do you have in your std::wstring input? codecvt_utf8 is for UTF-8 to/from UTF-32. Since you're on Windows, I'm guessing your std::wstring has UTF-16 data in it in which you'll want codecvt_utf8_utf16. – NuSkooler Jan 12 '16 at 20:24
  • Even if i skip the whole wstring part and just assign a value to a normal std::string, it will still not transfer that value forward to either fromUTF8() or StringPiece. For instance, `std::string test("abc"); UnicodeString source = UnicodeString::fromUTF8(test);` doesn't work either – Peter Jan 12 '16 at 20:40
  • I replied separately. On windows `wchar_t` should be 16-bit UTF-16 code units. – Steven R. Loomis Jan 12 '16 at 22:58

1 Answers1

0

@NuSkooler (hi!) is spot on of course. In any event, try this for converting between UnicodeString and std::wstring iff std::wstring is actually UTF-16. (not tested)

std::wstring doSomething(const std::wstring &input) {

#if(sizeof(wchar_t) != sizeof(UChar))
#error no idea what (typically underspecified) wchar_t actually is.
#else

// source is a read-only alias to the input data
const UnicodeString source(FALSE, input.data(), input.length());

// DO SOMETHING with the data
UnicodeString target = SOME_ACTUAL_FUNCTION(source); // <<<< Put your actual code here

// construct an output wstring 
std::wstring output(target.getBuffer(), target.length());

// return it
return output;
#endif
}
Steven R. Loomis
  • 4,228
  • 28
  • 39
  • Thank you very much! After some minor adjustments, this worked for me, with the added bonus of being in wstring! – Peter Jan 12 '16 at 23:44