2

I ran into some trouble while creating a C-Extension for ruby that got me thinking. I wonder how Ruby (1.9.1) handles strings (and all the encoding-stuff) internally?

If I have a string like "o", and I pass the string to a C-Function (as VALUE), I can deal with it pretty easily using the RSTRING_PTR() and the RSTRING_LEN() macro. However, if I make the string ö (a german umlaut character), RSTRING_LEN() will give me 2.

I'm a bit stumped on the contents of RSTRING_PTR() in that case, the two bytes are 0xA4 and 0xC3. What encoding is this? I tried using "ö".force_encoding( ... ) with different encodings before passing the string to the C-function, but that does not affect the contents of RSTRING_PTR at all.

What I need is a way to have the string represented as a WCHAR* encoded in UTF-16 (in the case of "ö", that would be 0x00F6) in my C-function, but that's kinda hard to do if you do not know what encoding you're coming from...

thx for any help in advance

DeX3
  • 5,200
  • 6
  • 44
  • 68
  • `force_encoding` isn't supposed to change the contents of the string, it just changes how the string is read. – Cubic Sep 17 '12 at 12:38

1 Answers1

2

String internals in ruby 1.9 depends on __ENCODING__ constant and Encoding.default_internal setting.

In your case it looks like UTF-8 (default), but ö is actually c3 b6 in UTF-8, and c3 a4 is ä

zed_0xff
  • 32,417
  • 7
  • 53
  • 72
  • oh yeah, you're right I mixed up my testcases. Thx for the help, conversion works now =) – DeX3 Jun 27 '12 at 12:16