Questions tagged [utf]

Unicode Transformation Format (8/16/32/...) used for encoding Unicode code points

unicode defines abstract CodePoints and their interactions. It also defines multiple encodings for storage and exchange of those CodePoints. All of them can express all valid Unicode CodePoints, though they have different size, compatibility, expressiveness for invalid data and efficiency characteristics.

utf-8 (people sometimes only write UTF for this encoding), can encode all valid and invalid sequences in the other encodings, as well as being an ascii superset. If there is no compelling compatibility constraint, this encoding is preferred.
punycode Used only for international domain names. (historical contenders were utf-5 and utf-6)
GB18030 is the official chinese encoding.
UTF-EBCDIC should fill the role of utf-8 for Ebcdic system but never caught on.
utf-7 This encoding was designed for systems which are not 8bit-clear like old email, but never gained much popularity even there.

The following encodings have 3 variants: big-endian, little-endian and any-endian with BOM.

utf-16 (utf-16le) Early adopters who embraced ucs2 when people thought 64k are enough moved to this encoding. Beside orphaned surrogates, one cannot encode bad utf-8 or utf-32 sequences as utf-16. Also, it is rarely more space-efficient than utf-8, nor is it fixed width (not even utf-32 really is).
utf-32 (identical to ucs4 aka modern ucs) This is the 1 CodeUnit per CodePoint encoding. Due to combining CodePoints negating this only questionable benefit, and huge storage demand, it is seldom used even for internal representation.

Resources

Wikipedia on Unicode

857 questions

-1

votes

1 answer

words and sentences disorganization in last version of Google Chrome

I have Menus in my html code , I don't have problem with chrome and other web browsers until last update of google chrome to 38.0.2125 that cause disorganization in my menus and other parts (utf-8 encoding). this is an exapmle of this problem…

google-chrome utf

asked Nov 16 '14 at 14:40

Hamid Tourani

-1

votes

1 answer

How to fix jquery ajax response with � (quotation mark block)

When I try to get html back from a ajax request I get multiple �. Why is this and how to correct it? $.ajax ({ type: 'POST', url: Generic.ajaxSluice, dataType: "html", data: { param:…

javascript ajax encoding utf-8 utf

asked Jul 19 '14 at 14:17

Kristoffer Frisell Jarnevid

-1

votes

1 answer

Handling UTF filenames in Windows

Given the following files: E:/Media/Foo/info.nfo E:/Media/Bar/FXGâ¢.nfo I can "find" them with the following: BASE = r'E:/Media/' for dirpath, _, files in os.walk(BASE): for f in fnmatch.filter(files, '*.nfo'): nfopath =…

python python-2.7 filenames windows-7-x64 utf

asked Jul 15 '14 at 19:32

jedwards

29,432
3
65
92

-1

votes

2 answers

character encoding php - javascript

I got a php file that manages a entity in my database. What I want to do is to retrieve a set of strings from the database and return it via json_encode to a javascript function. The problem is that when the php script retrieves the values the…

javascript php json utf

asked Jun 04 '14 at 09:33

Albert Prats

-1

votes

1 answer

java.io.utfdataformatexception: String is too long

I am getting the exception as in the Title while sending an image to a java server Here's the code: ByteArrayOutputStream stream = new ByteArrayOutputStream(); img.compress(Bitmap.CompressFormat.PNG, 100, stream); byte[]…

java client-server utf

asked Dec 09 '12 at 11:04

Saaram

-2

votes

2 answers

How to write "Keycap Digit One"=1️⃣ from a utf code on console?

How do I represent “Keycap Digit One”=1️⃣ in a string? How can I output 1️⃣ to [9] on the console using escape codes, the same way I can output on the console by using console.log('\u{1F51F}');? I would also like to be able to output 1️⃣ to [9] in…

javascript unicode utf

asked May 19 '20 at 03:10

reteid2222

-2

votes

1 answer

16-bit encoding that has all bits mapped to some value

UTF-32 has its last bits zeroed. As I understand it UTF-16 doesn't use all its bits either. Is there a 16-bit encoding that has all bit combinations mapped to some value, preferably a subset of UTF, like ASCII for 7-bit?

unicode encoding utf-16 utf 16-bit

asked Nov 06 '18 at 16:08

J Alan

-2

votes

3 answers

Working with UTF-8 strings and characters in C++

I'm working on a project which works on utf-8 strings character by character, however I was unable to find a way to work on UTF-8 strings on that manner in C++. What I need is: The strings need to be UTF-8, since the strings won't be limited to…

c++ text utf

asked Oct 21 '18 at 18:58

bayindirh

-2

votes

1 answer

Gaps in cmd to c++

How can I get the path from CMD cointaning gaps " "? Here is the code I tried without success: if (argv[3] == NULL) { cout << "" << endl; } else if (strcmp(argv[3], "/d") == 0) { const size_t cSize = strlen(argv[4]) + 1; wchar_t* wc =…

c++ utf

asked Aug 07 '17 at 09:36

Mike Litoris

-2

votes

2 answers

Trouble on Unicode encoded data in Python

Hello StackOverflow community. I am a fairly new user of Python, so sorry in advance for the sillyness of this question ! But I have tried to fix it out for hours but still not having figured it out. I am trying to import a large dataset of text to…

python csv unicode encoding utf

asked Mar 21 '16 at 12:03

Nahid O.

-2

votes

1 answer

UTF-8 without signature vs UTF-8 with signature

Using visual studio 2005 text file generating with UTF-8 with signature. I need without signature.

c# utf-8 utf

asked Nov 18 '14 at 09:58

soni s raj

-2

votes

1 answer

.net string type - is it utf16 by default?

I coded up this little test case to try and understand base64 encodings, but I ran into this problem. see below, why are "stringUtf16" and the "stringDefault" from Encoding.Default not equal? one has a length of 4, the other a length of 3... but…

.net string encoding base64 utf

asked Oct 03 '14 at 19:48

Raymond

3,382
5
43
67

-3

votes

1 answer

How would I convert this .NET string to UTF-8

var stringu = @"\u003cbr /\u003e\u003cbr /\u003eHello world"; Background here - I'm using HttpClient to request data, and am getting back a JSON string in UTF-8 (Content-Type: application/json; charset=utf-8 is the the header on the response). To…

c# .net utf

asked Jul 23 '13 at 05:58

user466512

-4

votes

1 answer

Problem with decoding utf8 characters - šđžčć

I have a word which contains some of these characters - šđžčć. When I take the first letter out of that word, I'll have a byte, when I convert that byte into string I'll get incorrectly decoded string. Can someone help me figure out how to decode…

go utf

asked Jun 25 '18 at 16:43

Alen

1,750
7
31
62

-4

votes

1 answer

Strange text in files

I have some dump file which consist of string like UserComment SeqOne ABCDE I am not able to understand what , , , and mean in this string. Is it in UTF or some other…

encoding utf

asked May 31 '16 at 10:33

aga

Prev 1 2 3

…

58 Next