2

I am using https://github.com/KxSystems/kdb/blob/master/c/c.js for connecting HTML 5 websocket to KdB+ backend. I am facing an issue while sending Chinese characters from the backend. HTML charset is set for UTF-8 but still displays mess code rathan than correct characters. Same result when I print it in the web browser console. Dose c.js support UTF-8? How can I display Unicode characters correctly sent by KDB+ in the browser?

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
Rongshu
  • 31
  • 3

1 Answers1

1

As of 2016.03.18, c.js should support (de)serialization of UTF8. The version here has the functions to do so.

More information on unicode charsets in kdb+ can be found here.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
Paul Kerrigan
  • 465
  • 5
  • 12
  • I am using the latest version of c.js (kx.com/q/c/c.js). But looks like it dose not (de)serialization of UTF8 correctly. For example, I did a test of sending Chinese characters from web browser to the kdb+ backend as below. in HTML js: var query = {func:"test",arg1:"你好"}; ws.send(serialize(query)); in kdb+ q)test:{show x} q)"\344\275\240\345\245\275" q)`char$"你好" "\304\343\272\303" As you can see "\344\275\240\345\245\275" is different from "\304\343\272\303". It looks like c.js uses a different method of (de)serialization of UTF8 from kdb+ dose. – Rongshu Apr 24 '17 at 02:15
  • Try printing (with -1@) the bytestream that you got from the browser - that should return the correct characters. There's more information here on how q treats unicode (http://code.kx.com/wiki/Cookbook/Unicode) which I'll edit into the original answer now, hope it helps. – Paul Kerrigan Apr 24 '17 at 08:15
  • Thanks for the link. q)test:{show -1 x;} q)test["\344\275\240\345\245\275"] 浣犲ソ But 浣犲ソ is incorrect. The bytestream should be "\304\343\272\303", which is corresponding to 你好. I don't know why "你好" sent from the browser serialized to "\344\275\240\345\245\275" rather than "\304\343\272\303" ?@paul – Rongshu Apr 24 '17 at 10:12
  • That's very strange - what happens if you serialize some ASCII text using c.js on the same HTML page and send that to the backend? Does that also become gibberish? – Paul Kerrigan Apr 25 '17 at 12:44
  • I find out the issue behind this. It was due to the miss match charset in KDB+ and HTML. KDB+ use gbk while c.js only support UTF-8 or UTF-16. Is there a way to transfer text from gbk to UTF-8 in kdb+? – Rongshu Apr 26 '17 at 02:41
  • I'm not aware of a kdb tool for it - since you're already using js, this (http://stackoverflow.com/questions/17211780/how-do-i-convert-gbk-to-utf8-with-pure-javascript) might do the job for you. – Paul Kerrigan Apr 26 '17 at 08:49