Questions tagged [utf]

Unicode Transformation Format (8/16/32/...) used for encoding Unicode code points

defines abstract CodePoints and their interactions. It also defines multiple s for storage and exchange of those CodePoints. All of them can express all valid Unicode CodePoints, though they have different size, compatibility, expressiveness for invalid data and efficiency characteristics.

  • (people sometimes only write UTF for this encoding), can encode all valid and invalid sequences in the other encodings, as well as being an ascii superset. If there is no compelling compatibility constraint, this encoding is preferred.
  • Used only for international domain names. (historical contenders were utf-5 and utf-6)
  • GB18030 is the official chinese encoding.
  • UTF-EBCDIC should fill the role of utf-8 for Ebcdic system but never caught on.
  • This encoding was designed for systems which are not 8bit-clear like old email, but never gained much popularity even there.

The following encodings have 3 variants: big-endian, little-endian and any-endian with BOM.

  • () Early adopters who embraced when people thought 64k are enough moved to this encoding. Beside orphaned surrogates, one cannot encode bad utf-8 or utf-32 sequences as utf-16. Also, it is rarely more space-efficient than utf-8, nor is it fixed width (not even utf-32 really is).
  • (identical to ucs4 aka modern ) This is the 1 CodeUnit per CodePoint encoding. Due to combining CodePoints negating this only questionable benefit, and huge storage demand, it is seldom used even for internal representation.

Resources

857 questions
-1
votes
1 answer

words and sentences disorganization in last version of Google Chrome

I have Menus in my html code , I don't have problem with chrome and other web browsers until last update of google chrome to 38.0.2125 that cause disorganization in my menus and other parts (utf-8 encoding). this is an exapmle of this problem…
-1
votes
1 answer

How to fix jquery ajax response with � (quotation mark block)

When I try to get html back from a ajax request I get multiple �. Why is this and how to correct it? $.ajax ({ type: 'POST', url: Generic.ajaxSluice, dataType: "html", data: { param:…
-1
votes
1 answer

Handling UTF filenames in Windows

Given the following files: E:/Media/Foo/info.nfo E:/Media/Bar/FXGâ¢.nfo I can "find" them with the following: BASE = r'E:/Media/' for dirpath, _, files in os.walk(BASE): for f in fnmatch.filter(files, '*.nfo'): nfopath =…
jedwards
  • 29,432
  • 3
  • 65
  • 92
-1
votes
2 answers

character encoding php - javascript

I got a php file that manages a entity in my database. What I want to do is to retrieve a set of strings from the database and return it via json_encode to a javascript function. The problem is that when the php script retrieves the values the…
Albert Prats
  • 786
  • 4
  • 15
  • 32
-1
votes
1 answer

java.io.utfdataformatexception: String is too long

I am getting the exception as in the Title while sending an image to a java server Here's the code: ByteArrayOutputStream stream = new ByteArrayOutputStream(); img.compress(Bitmap.CompressFormat.PNG, 100, stream); byte[]…
Saaram
  • 337
  • 3
  • 7
  • 29
-2
votes
2 answers

How to write "Keycap Digit One"=1️⃣ from a utf code on console?

How do I represent “Keycap Digit One”=1️⃣ in a string? How can I output 1️⃣ to [9] on the console using escape codes, the same way I can output on the console by using console.log('\u{1F51F}');? I would also like to be able to output 1️⃣ to [9] in…
-2
votes
1 answer

16-bit encoding that has all bits mapped to some value

UTF-32 has its last bits zeroed. As I understand it UTF-16 doesn't use all its bits either. Is there a 16-bit encoding that has all bit combinations mapped to some value, preferably a subset of UTF, like ASCII for 7-bit?
J Alan
  • 77
  • 1
  • 11
-2
votes
3 answers

Working with UTF-8 strings and characters in C++

I'm working on a project which works on utf-8 strings character by character, however I was unable to find a way to work on UTF-8 strings on that manner in C++. What I need is: The strings need to be UTF-8, since the strings won't be limited to…
bayindirh
  • 425
  • 6
  • 21
-2
votes
1 answer

Gaps in cmd to c++

How can I get the path from CMD cointaning gaps " "? Here is the code I tried without success: if (argv[3] == NULL) { cout << "" << endl; } else if (strcmp(argv[3], "/d") == 0) { const size_t cSize = strlen(argv[4]) + 1; wchar_t* wc =…
Mike Litoris
  • 27
  • 1
  • 6
-2
votes
2 answers

Trouble on Unicode encoded data in Python

Hello StackOverflow community. I am a fairly new user of Python, so sorry in advance for the sillyness of this question ! But I have tried to fix it out for hours but still not having figured it out. I am trying to import a large dataset of text to…
Nahid O.
  • 171
  • 1
  • 3
  • 14
-2
votes
1 answer

UTF-8 without signature vs UTF-8 with signature

Using visual studio 2005 text file generating with UTF-8 with signature. I need without signature.
-2
votes
1 answer

.net string type - is it utf16 by default?

I coded up this little test case to try and understand base64 encodings, but I ran into this problem. see below, why are "stringUtf16" and the "stringDefault" from Encoding.Default not equal? one has a length of 4, the other a length of 3... but…
Raymond
  • 3,382
  • 5
  • 43
  • 67
-3
votes
1 answer

How would I convert this .NET string to UTF-8

var stringu = @"\u003cbr /\u003e\u003cbr /\u003eHello world"; Background here - I'm using HttpClient to request data, and am getting back a JSON string in UTF-8 (Content-Type: application/json; charset=utf-8 is the the header on the response). To…
user466512
  • 127
  • 2
  • 9
-4
votes
1 answer

Problem with decoding utf8 characters - šđžčć

I have a word which contains some of these characters - šđžčć. When I take the first letter out of that word, I'll have a byte, when I convert that byte into string I'll get incorrectly decoded string. Can someone help me figure out how to decode…
Alen
  • 1,750
  • 7
  • 31
  • 62
-4
votes
1 answer

Strange text in files

I have some dump file which consist of string like UserComment SeqOne ABCDE I am not able to understand what , , , and mean in this string. Is it in UTF or some other…
aga
  • 359
  • 2
  • 4
  • 15
1 2 3
57
58