I have a Ruby program running on Windows which calls a shell command (which is known to output UTF-16) using Open3:
attrs={}
attrs[:stdout], attrs[:stderr], status = Open3.capture3(command)
unless attrs[:stderr].nil?
begin
attrs[:stderr].force_encoding(Encoding::UTF_16LE).encode!(Encoding::UTF_8)
rescue => e
attrs[:stderr] = attrs[:stderr].bytes.to_json.encode!(Encoding::UTF_8)
end
end
If the force_encoding to UTF_16LE doesn't work, and throws an exception, I simply save the bytes, encode it as a JSON string and encode it as UTF_8.
Well....the exception was thrown and I caught the output array of bytes in the rescue clause. It looks like this:
[10,84,104,105,115,32,97,112,112,108,105,99,97,116,105,111,110,32,104,97,115,32,114,101,113,117,101,115,116,101,100,32,116,104,101,32,82,117,110,116,105,109,101,32,116,111,32,116,101,114,109,105,110,97,116,101,32,105,116,32,105,110,32,97,110,32,117,110,117,115,117,97,108,32,119,97,121,46,10,80,108,101,97,115,101,32,99,111,110,116,97,99,116,32,116,104,101,32,97,112,112,108,105,99,97,116,105,111,110,39,115,32,115,117,112,112,111,114,116,32,116,101,97,109,32,102,111,114,32,109,111,114,101,32,105,110,102,111,114,109,97,116,105,111,110,46,10]
How can I convert it back to text in some format. e.g. If I do:
irb> "dog".bytes
=> [100, 111, 103]
irb> "कुत्रा".bytes
=> [224, 164, 149, 224, 165, 129, 224, 164, 164, 224, 165, 141, 224, 164, 176, 224, 164, 190]
Is there a way to programmatically convert [100, 111, 103] to "dog" or [224, 164, 149, 224, 165, 129, 224, 164, 164, 224, 165, 141, 224, 164, 176, 224, 164, 190] back to "कुत्रा" ? And is there a way to figure out what my output array of bytes means?
------------------------- UPDATE ---------------------------
I dug around a bit, but it took a while, because "decode" is not a thing. However, I did the following with the array which I held in the variable message:
message.map{|c| c.chr}.join("")
=> "\nThis application has requested the Runtime to terminate it in an unusual way.\nPlease contact the application's support team for more information.\n"
So my problem is solved, in that the error message is not in UTF-16LE.
However, when I did this, I got the result which follows:
irb> "कुत्रा".bytes.map{|c| c.chr}.join("")
=> "\xE0\xA4\x95\xE0\xA5\x81\xE0\xA4\xA4\xE0\xA5\x8D\xE0\xA4\xB0\xE0\xA4\xBE"
How do I convert this strange looking string or byte sequence into the more meaningful "कुत्रा" ?