0

I am currently working on an application in C++, that ties into Lua, that ties into Flash (in that order). My goal at the moment is getting wchar_ts from C++ into Flash, via Lua. I would love any insights as to how I can accomplish this!

If any other information is required, please ask and I'll do my best to provide it


What I have tried

It's my understanding that Lua is not a fan of Unicode, but it should still be able to receive the string of bytes from my C++ application. I imagine there must be a way to then pass those bytes over to Flash to then render out my intended Unicode. So what I've done so far:

C++:

//an example wchar_t*
const wchar_t *text = L"Test!";

//this function pushes a char* to my Lua code
lua.PushString((char*)text); //directly casting text to a char*... D:

Lua:

theString = FunctionThatGetsWCharFromCpp();
flash.ShowString(theString);

Flash:

function ShowString(theString:String)
{
    myTextField.text = theString;
}

Now the outcome here is that myTextField only shows "T". This made sense to me. The cast from wchar_t to char would end up padding out the chars with some zeros, especially since "T" doesn't really utilize both bytes of a wchar_t. A quick look at the documentation yields:

lua_pushstring

The string cannot contain embedded zeros; it is assumed to end at the first zero.

So I ran a little test:

C++:

//prefixing with a Japanese character 
//which will use both bytes of the wchar_t
const wchar_t *text = L"たTest!";

The Flash textbox now reads: "_0T", 3 characters. Makes total sense, the 2 bytes of the Japanese character + T, then termination.

I understand what is going on, but I am still completely unsure of how to tackle this problem. And I'm really unsure of what to search for. Is there a specific Lua function I can use to pass a wad of bytes over to Lua from C++ (I've read somewhere that lua_pushlstring is often used for this, but that also terminates at first zero)? Is there a Flash datatype that will accept these bytes, then I'll need to do some sort of conversion to get them into a readable, multibyte string? or is this just really not possible?

Note:
I'm not too familiar with Unicode and code pages and whatnot, so I'm not too sure if there'll also be a step where I'll need to specify the correct encoding in Flash so that I can get the correct output - but I'm happy to cross that bridge when I get there, but if anyone has any insight here too, that would be great!

Jace
  • 3,052
  • 2
  • 22
  • 33
  • What Lua are you using? alchemy? – Steven Apr 04 '13 at 02:36
  • Do try lua_pushlstring. – lhf Apr 04 '13 at 02:37
  • And have you tried using a ffi? Or LuaJit+FFI. They will provide a much easier interface then regular Lua stuff. – Steven Apr 04 '13 at 02:38
  • @Steven No I'm not using Alchemy and I haven't tried FFI/LuaJit. This project is already very VERY far down the line, are these things that I could still potentially incorporate without doing a huge rewrite? (I'm still fairly new to Lua, and this project for that matter) – Jace Apr 04 '13 at 02:50
  • @lhf I have tried lua_pushlstring, but it also terminates at the first embedded zero. – Jace Apr 04 '13 at 02:51
  • lua_pushlstring should work. In your test string, there are 12 bytes; Make sure that's what you're passing to lua_pushlstring. (If you also need to pass the NUL char to Flash, there are 14 bytes.) – Tom Blodget Apr 04 '13 at 03:34
  • "Lua is not a fan of Unicode" -- You have to keep in mind the definition of the Lua string data type: It's a counted sequence of bytes. What the few Lua functions that deal with characters expect for character set and encoding depends how the Lua interpreter was compiled. – Tom Blodget Apr 04 '13 at 03:41
  • @TomBlodget `lua_pushlstring` didn't work. It returned just the first 3 characters like `lua_pushstring` did... However, Adam's answer seems to be the way to go. But thank you all for your time! – Jace Apr 04 '13 at 04:27

1 Answers1

1

I don't know if this will work, but I'd recommend trying to use UTF-8. A string encoded in UTF-8 doesn't have any embedded zeros in it, so Lua should be able handle it, and Flash ought to also be able to handle it, depending on how exactly the languages interface.

Here's one way to convert a wide-character string to UTF-8 using setlocale(3) wcstombs(3):

// Error checking omitted for expository purposes

// Call this once at program startup.  If you'd rather not change the locale,
// you can instead write your own conversion routine (but beware of UTF-16
// surrogate pairs if you do)
setlocale(LC_ALL, "en_US.UTF-8");

// Do this for each string you want to convert
const wchar_t *wideString = L"たTest!";
size_t len = wcslen(wideString);
size_t maxUtf8len = 4 * len + 1;  // Each wchar_t encodes to a max of 4 bytes
char *utf8String = new char[maxUtf8len];
wcstombs(utf8String, wideString, maxUtf8len);
...
// Do stuff with utf8string
...
delete [] utf8String;

If you're on Windows, you can instead use the WideCharToMultiByte function with the CP_UTF8 code page to do the conversion, since I don't believe that the Visual Studio C runtime supports UTF-8 locales:

// Error checking omitted for expository purposes
const wchar_t *wideString = L"たTest!";
size_t len = wcslen(wideString);
size_t maxUtf8len = 4 * len + 1;  // Each wchar_t encodes to a max of 4 bytes
char *utf8String = new char[maxUtf8len];
WideCharToMultiByte(CP_UTF8, 0, wideString, len + 1, utf8String, maxUtf8len, NULL, NULL);
...
// Do stuff with utf8string
...
delete [] utf8String;
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • Hi! Thanks for the answer. It seems as though this passed garbage data to Flash. Debugging in VS shows that `utf8String` is remaining unchanged by the `wcstombs` function. `wcstombs` also is returning an (unsigned) -1 - meaning: "a wide character was encountered which could not be converted". I've copy and pasted your code exactly. Any other ideas? – Jace Apr 04 '13 at 03:00
  • @Jace: Is the `setlocale` call succeeding? I suspect that the Visual Studio C runtime might not support UTF-8 locales. Given that you're on Windows, you can instead use the [`WideCharToMultibyte`](http://msdn.microsoft.com/en-us/library/dd374130.aspx) function with the `CP_UTF8` code page. – Adam Rosenfield Apr 04 '13 at 03:46
  • Spot on! `setlocale` was failing, so I did as you suggested and converted my string using `WideCharToMultibyte` using `CP_UTF8` and the rest just took care of itself. I'll need to run a few test cases to ensure the encoding will be fine, but this already was all I was looking to achieve with my question. Thanks! – Jace Apr 04 '13 at 04:23
  • Oh, also, could you please update your answer with some of the details you mentioned regarding `setlocale` and `WideCharToMultibyte` for completeness/future readers? – Jace Apr 04 '13 at 04:24