Questions tagged [utf]

Unicode Transformation Format (8/16/32/...) used for encoding Unicode code points

unicode defines abstract CodePoints and their interactions. It also defines multiple encodings for storage and exchange of those CodePoints. All of them can express all valid Unicode CodePoints, though they have different size, compatibility, expressiveness for invalid data and efficiency characteristics.

utf-8 (people sometimes only write UTF for this encoding), can encode all valid and invalid sequences in the other encodings, as well as being an ascii superset. If there is no compelling compatibility constraint, this encoding is preferred.
punycode Used only for international domain names. (historical contenders were utf-5 and utf-6)
GB18030 is the official chinese encoding.
UTF-EBCDIC should fill the role of utf-8 for Ebcdic system but never caught on.
utf-7 This encoding was designed for systems which are not 8bit-clear like old email, but never gained much popularity even there.

The following encodings have 3 variants: big-endian, little-endian and any-endian with BOM.

utf-16 (utf-16le) Early adopters who embraced ucs2 when people thought 64k are enough moved to this encoding. Beside orphaned surrogates, one cannot encode bad utf-8 or utf-32 sequences as utf-16. Also, it is rarely more space-efficient than utf-8, nor is it fixed width (not even utf-32 really is).
utf-32 (identical to ucs4 aka modern ucs) This is the 1 CodeUnit per CodePoint encoding. Due to combining CodePoints negating this only questionable benefit, and huge storage demand, it is seldom used even for internal representation.

Resources

Wikipedia on Unicode

857 questions

votes

1 answer

Char to UTF code in vbscript

I'd like to create a .properties file to be used in a Java program from a VBScript. I'm going to use some strings in languages that use characters outside the ASCII map. So, I need to replace these characters for its UTF code. This would be \u0061…

vbscript utf

asked Feb 10 '10 at 23:28

Carlos Blanco

8,592
17
71
101

votes

1 answer

Git cant diff or merge .cs file in utf-16 encoding

A friend and I were working on the same .cs file at the same time and when there's a merge conflict git points out there's a conflict but the file isnt loaded with the usual "HEAD" ">>>" stuff because the .cs files were binary files. So we added…

c# git utf

asked Aug 07 '13 at 19:24

user1879789

votes

4 answers

Spanish characters in Android Studio

I've got a problem with Android Estudio, i'm trying to develope an application but the characters like "¿" or "ñ" and "á,é,ó,í,ú" don't appear correctly when i run the application. I've tried to solve the problem changing the encoding to UTF-8 but…

encoding android-studio iso utf

asked Jul 05 '13 at 13:35

Dv Apps

votes

2 answers

Why is sys.getdefaultencoding() different from sys.stdout.encoding and how does this break Unicode strings?

I spent a few angry hours looking for the problem with Unicode strings that was broken down to something that Python (2.7) hides from me and I still don't understand. First, I tried to use u".." strings consistently in my code, but that resulted in…

python stdout utf sys

asked Mar 20 '13 at 17:29

Aleksandar Savkov

2,894
3
24
30

votes

1 answer

MSBuild.exe output encoding

I use MSBuild.exe for building solution on machine with russian language. But in TeamCity build log all russian chars in wrong encoding. How to setup MSBuild.exe for properly output (UTF-8 for example)?

c# msbuild utf

asked Feb 10 '12 at 06:30

Dmitriy Kudinov

1,051
5
23
31

votes

2 answers

Reading UTF-8 with BOM in ruby 2.5.0

Is there a way to read files encoded in UTF-8 with BOM (Byte order marks) on Ruby v2.5.0? On Ruby 2.3.1 this used to work: csv = CSV.open(file_path, encoding: 'bom|utf-8') However, on 2.5.0 the following error ocurrs: ArgumentError: unknown…

ruby csv encoding utf-8 utf

asked Feb 19 '18 at 20:34

romeu.hcf

votes

5 answers

UTF usage in C++ code

What is the difference between UTF and UCS. What are the best ways to represent not European character sets (using UTF) in C++ strings. I would like to know your recommendations for: Internal representation inside the code For string manipulation…

c++ unicode locale utf ucs

asked Oct 14 '08 at 05:36

Martin York

257,169
86
333
562

votes

4 answers

Do I need supplementary plane?

I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?) Thanks.

unicode utf astral-plane supplementary

asked Jun 21 '09 at 11:06

Tower

98,741
129
357
507

votes

2 answers

How to convert circled numbers to numbers ? (① to 1)

I would like to convert numbers from a string I receive after an OCR recognition over Japanese text. For example, when I extract a date: ③① 年 ⑫ 月 ①③ 日 I would like to convert it to: 31 年 12 月 13 日 What would be the best way to achieve it ?

text encoding ocr utf cjk

asked Feb 21 '19 at 03:17

Jonathan Muller

7,348
2
23
31

votes

3 answers

Persist UTF-8 as Default Encoding

I tried to persist UTF-8 as the default encoding in Python. I tried: >>> import sys >>> sys.getdefaultencoding() 'ascii' And I also tried: >>> import sys >>> reload(sys) >>> sys.setdefaultencoding('UTF8') >>>…

python utf-8 utf

asked Apr 14 '16 at 12:42

DenCowboy

13,884
38
114
210

votes

4 answers

PHP MySQL database strange characters

I'm trying to output product information stored in a MySQL database, but it's writing out some strange characters, like a diamond with a question mark inside of it. I think it may be an encoding/UTF8 issue, but I've specified the encoding I…

php mysql utf

asked Dec 15 '09 at 21:31

user231733

votes

1 answer

how strings are stored by python in computers?

I believe most of you who are familiar with Python have read Dive Into Python 3. In chapter 4.3, it says this: In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in UTF-8, or a Python…

python string encoding utf

asked Mar 15 '12 at 08:03

endless

votes

1 answer

Response.WriteFile() Strange characters issue

Hello in my aspx page using MVC 3, I have the following code: <%Response.WriteFile("/Content/Bing.htm"); %> Which is an include file that contains BING search box code. At the top of the containing DIV, a strange character is appearing: ï»¿ I…

c# asp.net-mvc-3 encoding utf

asked Apr 01 '11 at 14:28

Cyberdrew

1,832
1
19
39

votes

3 answers

idn_to_ascii() in 5.2.17

There's a very handy function idn_to_ascii() in PHP 5.3, but I'm running 5.2.17 and I can't change that. How do I encode Unicode domain names to ascii then?

php dns utf idn

asked Mar 23 '11 at 12:52

donk

1,540
4
23
46

votes

2 answers

Syllabification of Devanagari

I am trying to syllabify devanagari words धर्मक्षेत्रे -> धर् मक् षेत् रे dharmakeshetre -> dhar mak shet re wd.split('्') I get the result as : ['धर', 'मक', 'षेत', 'रे'] Which is partially correct I try another word कुरुक्षेत्र -> कु रुक् षेत्…

python string python-3.x utf devanagari

asked Oct 29 '17 at 13:21

Echchama Nayak

Prev 1 2 3

…

57 58 Next