Questions tagged [utf]

Unicode Transformation Format (8/16/32/...) used for encoding Unicode code points

unicode defines abstract CodePoints and their interactions. It also defines multiple encodings for storage and exchange of those CodePoints. All of them can express all valid Unicode CodePoints, though they have different size, compatibility, expressiveness for invalid data and efficiency characteristics.

utf-8 (people sometimes only write UTF for this encoding), can encode all valid and invalid sequences in the other encodings, as well as being an ascii superset. If there is no compelling compatibility constraint, this encoding is preferred.
punycode Used only for international domain names. (historical contenders were utf-5 and utf-6)
GB18030 is the official chinese encoding.
UTF-EBCDIC should fill the role of utf-8 for Ebcdic system but never caught on.
utf-7 This encoding was designed for systems which are not 8bit-clear like old email, but never gained much popularity even there.

The following encodings have 3 variants: big-endian, little-endian and any-endian with BOM.

utf-16 (utf-16le) Early adopters who embraced ucs2 when people thought 64k are enough moved to this encoding. Beside orphaned surrogates, one cannot encode bad utf-8 or utf-32 sequences as utf-16. Also, it is rarely more space-efficient than utf-8, nor is it fixed width (not even utf-32 really is).
utf-32 (identical to ucs4 aka modern ucs) This is the 1 CodeUnit per CodePoint encoding. Due to combining CodePoints negating this only questionable benefit, and huge storage demand, it is seldom used even for internal representation.

Resources

Wikipedia on Unicode

857 questions

votes

1 answer

UTF-8 string to ordinal value: Java equivalent for Python output

I have the feeling this is most likely a duplicate, but I'm unable to find it. NOTE: My Python knowledge is very limited, so I'm not 100% sure how strings, bytes, and encodings are done in Python. My knowledge about encodings in general is also not…

java python utf-8 byte utf

asked Feb 04 '19 at 15:18

Kevin Cruijssen

9,153
9
61
135

votes

1 answer

Extra character appearing in email subject Â£ in front of pound symbol

I am using a class system php file to send an HTML email from mysql database, an extra character appears in front of the £pound symbol in the subject title, but the main content of the email is fine. I have tried using a UTF charset for the…

php utf

asked Jan 13 '19 at 18:56

Daniel

votes

1 answer

List of BOM characters

Is there a list of possible BOM characters that are used? So far I have encountered: \x00\x00\xfe\xff UTF-32, big-endian \xff\xfe\x00\x00 UTF-32, little-endian \xfe\xff UTF-16, big-endian \xff\xfe UTF-16,…

csv hex utf byte-order-mark

asked Dec 19 '18 at 19:20

user10332687

votes

0 answers

How to truly count UTF-8 characters, and emoji's and special characters with different character lengths?

I just want to ask a really confusing question and get a really basic answer to how it all works, basically my problem is when I count character lengths in JavaScript and PHP for symbols and emoji's like ‍❤️‍‍ it comes up 11 characters instead of…

javascript php unicode utf

asked Nov 12 '18 at 21:43

Lol Boi

votes

0 answers

Using UTF-8 in SQL Server 2016

We are installing a new application, the pre-requisite of which says that your database must be configured to use the UTF-8 character set. We are currently using SQL Server 2016, enterprise edition. Our database team mentioned to us that SQL Server…

sql-server sql-server-2016 utf

asked Oct 05 '18 at 11:40

Newbie

votes

1 answer

AWS RDS Oracle Standard edition seems to ignore NLS_LENGTH_SEMANTICS

Given the following table: SQL> DESC MM02.MMRZET01; Name Null? Type ----------------------------------------- -------- ---------------------------- LPT_ID NUMBER(19) COU_ISO_ID …

oracle amazon-web-services amazon-rds utf

asked Jul 19 '18 at 16:02

favoretti

29,299
4
48
61

votes

1 answer

Scanner.nextInt() NoSuchElementException

I got this code (sorry for the german inside): public void eingabewerte(){ int steuerInt; steuerInt=-1; Scanner myScanner = new Scanner(System.in); System.out.println("Bitte geben Sie die maximal Augenzahl des Wuerfels an…

java unicode java.util.scanner utf nosuchelementexception

asked Apr 19 '18 at 15:52

Pa.Don

votes

2 answers

boost locale incomplete type boundary_indexing

I am first converting an utf-8 string to utf-32 and then I want unique words to be mapped with their positions. I started with boost locale. #include #include #include #include #include…

c++ boost locale utf utf-32

asked Feb 14 '18 at 14:47

Neel Basu

12,638
12
82
146

votes

1 answer

Saving Keras Model: UTF - 8 Error

I've built a convolutional neural network in keras that looks like this: model = Sequential() model.add(Convolution2D(nb_filters, nb_conv, nb_conv, border_mode='valid', …

utf-8 neural-network hdf5 convolution utf

asked Jan 13 '18 at 04:55

Palash Shah

votes

0 answers

mbstring functions - php 7.0 - conversion to utf-8

I am using php 7.0. To make the site fully utf-8 compatible, there are many steps we have to take as explained here. I have doubt about mbstring encoding. The following is the ideal mbstring settings, as I understand, to be placed at beginning of…

php utf

asked Jan 02 '18 at 13:29

Kiran

votes

1 answer

Difference between combining acute accent and combining acute tone mark and how to normalize

So I have one application (let's call it the client) which uses strings with Diacritic/Accents. This application needs to make a request to another application (let's call it web service) using these strings with a diacritic. This other application…

unicode encoding diacritics non-ascii-characters utf

asked Sep 24 '17 at 17:46

dade

3,340
4
32
53

votes

0 answers

mysql data corrupted after changing encoding

I accidentally changed the encoding of a field from UTF-8 to macroman, after switching back to UTF-8, all Chinese characters were scrambled up, is there any chance I can reverse the process? or the change is permanent ?

mysql encoding utf

asked Sep 15 '17 at 10:18

user3500286

votes

1 answer

Printing Chinese Characters in C++

I've been trying to print Chinese Characters in C++. I've already searched around in the Internet, some said that you have to use wcout, others have suggested other methods. I've also stumbled on this post, where someone uses a piece of…

c++ utf

asked Sep 12 '17 at 18:38

El3ctroGh0st

votes

1 answer

String returns only numbers after separatedBy

I´m trying to separate a string like the following: let path = "/Users/user/Downloads/history.csv" do { let contents = try NSString(contentsOfFile: path, encoding: String.Encoding.utf8.rawValue ) let rows =…

ios swift string encoding utf

asked Aug 13 '17 at 18:31

Josch Hazard

votes

1 answer

Encoding issue: how to let console print "ć" instead of "c"?

I am working with data from all possible European languages. R does not recognize special characters correctly, e.g. "ć" instead of "c". > "ć" [1] "c" I have come accross this various times and found workarounds (read.csv, and other functions have…

r encoding character-encoding character utf

asked Jul 31 '17 at 12:49

Doctor G

Prev 1 2 3

…

57 58 Next