Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
0
votes
0 answers

c++ Windows UTF16 L String constant

In C++ there is the following statement wstring tester = L"Работа Центра"; we set the console output int res = _setmode( _fileno(stdout), _O_U16TEXT); then we try wcout << "tester " << tester << endl << flush; and we get on the console... tester…
ort11
  • 3,359
  • 4
  • 36
  • 69
0
votes
1 answer

PHP UTF-16 to ASCII conversion

Consider the following string. Its encoded in UTF-16-LE and saved into a PHP variable. I failed to get either mbstring or iconv to replace the ' with single quote. What would be a good way to sanatize it. String : Carl Sagan's Cosmic Connection
gnosio
  • 1,639
  • 3
  • 14
  • 14
0
votes
1 answer

_wfopen not yielding correct filenames with 16-bit wchar_t string

when I code exactly like this: setlocale(LC_ALL,""); wchar_t myString2[] = { 0x0061, 0x2660, 0x2663, 0x2665, 0x2666, 0x0000 }; fd = _wfopen(myString2, L"w"); or fd = _wfopen(myString2, L"w, ccs=UTF-16"); The result is not what I…
Rob
  • 173
  • 1
  • 2
  • 12
0
votes
1 answer

Matching lines of text in a UTF16-LE file

I'm parsing a file that's written in UTF16-LE for fairly simple matches and none of them seem to trigger. For example, I have the following code. with open(filepath) as f: for line in f: if 'TEST_CASE' in line: print(line) Is…
angusiguess
  • 639
  • 5
  • 11
0
votes
1 answer

C++ Using std::string functions for MBCS and std::wstring functions for UTF-16

Has anyone dealt with using std::string functions for MBCS? For example in C I could do this: p = _mbsrchr(path, '\\'); but in C++ I'm doing this: found = path.find_last_of('\\'); If the trail byte is a slash then would find_last_of stop at the…
loop
  • 3,460
  • 5
  • 34
  • 57
0
votes
2 answers

VB 6.0 -> Delphi XE2 Conversion

Public Function UTF8FromUTF16(ByRef abytUTF16() As Byte) As Byte() Dim lngByteNum As Long Dim abytUTF8() As Byte Dim lngCharCount As Long On Error GoTo ConversionErr lngCharCount = (UBound(abytUTF16) + 1) \ 2 …
user1390537
  • 15
  • 2
  • 4
0
votes
1 answer

Send UTF-16 encoded data with PHP curl

I'm building a php client to a web service that requires posted data to be encoded as UTF-16. How do i configure curl to encode my data in UTF-16 and also to decode the answer in UTF-16? Some sample code: $s =…
DukeOf1Cat
  • 1,087
  • 15
  • 34
0
votes
1 answer

unicode string in file contain different

My system is fedora. From some reason .The last field of one record is a unicode string (use memcpy copy data from a guest machine in qemu) . The unicode string is windows regedit key…
jiamo
  • 1,406
  • 1
  • 17
  • 29
-1
votes
1 answer

mb_convert_encoding() with UTF-16 input in PHP > 8.1

I'm updating a PHP app which imports CSV encoded in UTF-16 (from Google Keyword Planner) and the values are converted to UTF-8. Until PHP 8 it's working as expected, but from PHP 8.1 there is a ? added to the values after the conversion from UTF-16…
Daniel
  • 225
  • 3
  • 14
-1
votes
1 answer

Should I delete blank values in utf-16 encoding?

When I read all the bytes from a string using Encoding.Unicode, It gives me blank (0) values. When I run this code: byte[] value = Encoding.Unicode.GetBytes("Hi"); It gives me the output 72 0 105 0 I know this is because UTF-16 stores 2 bytes and…
-1
votes
1 answer

Reading a Binary File with UTF16 format

In C language, I am trying to read a Binary file in UTF16 format. I tried to this; binaryFile = fopen("data.dat", "rb, ccs=UTF16LE"); And it did not work. I need to do this without using the UTF16 reading library specifically, and I can't think of…
bebibabi
  • 29
  • 2
-1
votes
2 answers

codeUnits property vs utf8.encode function in Dart

I have this little code: void main(List args) { const data = 'amigo+/=:chesu'; var encoded = base64Encode(utf8.encode(data)); var encoded2 = base64Encode(data.codeUnits); var decoded = utf8.decode(base64Decode(encoded)); var…
-1
votes
1 answer

Understanding binary String delimiters

I am confused about the difference between encodings that are represented with \x, like \x68\x65\x6c\x6c\x6f vs. ones using \u, such as \u0068\u0065\u006c\u006c\u006f. I've been playing around with…
Eric Grossman
  • 219
  • 1
  • 4
  • 9
-1
votes
1 answer

How can I remove the BOM from a UTF-16 LE file in C++?

I have a UTF-16 LE file and at the beginning it has a BOM. How do I remove this using C++? I've seen many Python examples. Ultimately I would like it to be a UTF-8.
JeffR
  • 765
  • 2
  • 8
  • 23
-1
votes
1 answer

I want to generate the unicode UTF-16 for a text file in python

like in the picture, i have a normal text in a file that i want to write in unicode like that i'm having this code, but it doesn't do the job, it just write the text as it is while i need really the utf-16 encoding to be displayed with…
Moun
  • 325
  • 2
  • 16