Questions tagged [utf]

Unicode Transformation Format (8/16/32/...) used for encoding Unicode code points

unicode defines abstract CodePoints and their interactions. It also defines multiple encodings for storage and exchange of those CodePoints. All of them can express all valid Unicode CodePoints, though they have different size, compatibility, expressiveness for invalid data and efficiency characteristics.

utf-8 (people sometimes only write UTF for this encoding), can encode all valid and invalid sequences in the other encodings, as well as being an ascii superset. If there is no compelling compatibility constraint, this encoding is preferred.
punycode Used only for international domain names. (historical contenders were utf-5 and utf-6)
GB18030 is the official chinese encoding.
UTF-EBCDIC should fill the role of utf-8 for Ebcdic system but never caught on.
utf-7 This encoding was designed for systems which are not 8bit-clear like old email, but never gained much popularity even there.

The following encodings have 3 variants: big-endian, little-endian and any-endian with BOM.

utf-16 (utf-16le) Early adopters who embraced ucs2 when people thought 64k are enough moved to this encoding. Beside orphaned surrogates, one cannot encode bad utf-8 or utf-32 sequences as utf-16. Also, it is rarely more space-efficient than utf-8, nor is it fixed width (not even utf-32 really is).
utf-32 (identical to ucs4 aka modern ucs) This is the 1 CodeUnit per CodePoint encoding. Due to combining CodePoints negating this only questionable benefit, and huge storage demand, it is seldom used even for internal representation.

Resources

Wikipedia on Unicode

857 questions

votes

2 answers

How to combine two code points to get one?

I know that unicode code point for Á is U+00C1. I read on internet and many forums and articles that I can also make an Á by combining characters ´ (unicode: U+00B4) and A (unicode: U+0041). My question is simple. How to do it? I tried something…

python go unicode utf-8 utf

asked Nov 20 '22 at 03:20

Filip

votes

3 answers

Why does my html email appear differently when sent by AWS SES?

I've been working on some custom html email templates. I'm having some trouble with my emails appearing differently when they are sent by different email services. I'm using AWS SES to send these emails to clients. I've been using Postdrop to send…

html amazon-web-services email encoding utf

asked Nov 09 '22 at 00:33

Sam Sabin

votes

0 answers

What type should I be using to store and print these unicode characters? (C++)

I'm writing a CLI version of the game Snake in C++, and I want to make the snake look cool using different unicode characters to represent different states of the snake. I am using the following characters : ╔, ╗, ╚, ╝, ═, ║, ⩓, ⩔, ⪡, ⪢. I…

c++ windows unicode command-line-interface utf

asked Jun 24 '21 at 01:14

Harry

votes

1 answer

UTF8 to UTF16 conversion using std::filesystem::path

Starting from C++11 one can convert UTF8 to UTF16 wchar_t (at least on Windows, where wchar_t is 16 bit wide) using std::codecvt_utf8_utf16: std::wstring utf8ToWide( const char* utf8 ) { std::wstring_convert>…

c++ c++17 utf std-filesystem

asked Jun 03 '21 at 16:58

Fedor

17,146
13
40
131

votes

2 answers

How to represent this utf-8 encoded string in Rust?

On this RFC: https://www.rfc-editor.org/rfc/rfc7616#page-19 at page 19, there's this example of a text encoded in UTF-8: J U+00E4 s U+00F8 n D o e 4A C3A4 73 C3B8 6E 20 44 6F 65 How do I represent it in a Rust String? I tried…

rust utf-8 utf

asked Apr 18 '21 at 03:17

Gatonito

1,662
5
26
55

votes

2 answers

Why is '\u{1D11E}'.charAt(0) not equal to '\u{1D11E}'?

When I'm trying to evaluate this expression in console I have false as result, why? console.log('\u{1D11E}'.charAt(0) === '\u{1D11E}')

javascript unicode utf

asked Jul 22 '20 at 07:04

Json Prime

votes

2 answers

TypeError: Unicode-objects must be encoded before hashing in Hashlib Function

I have checked out all of the other solutions to the same problem on stackoverflow and also tried them, but nothing seemed to work. I am simply posting links here instead of the code as the code is huge and it would be less interactive. Link to the…

python python-3.x utf hashlib audio-fingerprinting

asked Apr 18 '20 at 21:59

ATUL MALAKAR

votes

1 answer

PHP DOMDocument Japanese Character encoding issue

I have a file called: ニューヨーク・ヤンキース-チケット-200x225.jpg I am able to successfully do this with my PHP code: if (file_exists(ABSPATH . 'ニューヨーク・ヤンキース-チケット-200x225.jpg')) { echo 'yes'; } However, when I parse my content using DOMDocument, that…

php encoding utf-8 utf

asked Aug 30 '19 at 15:00

James Cartwright

votes

1 answer

Xcode keeps guessing and interpreting with wrong encoding

I am using Xcode 4.0.2, the latest release of Xcode. All my projects or standelone source codes are in UTF-8 encoding. But when I open some source file (C/C++/Objective C), all text is interpreted in Mac OS Roman encoding and I don't know why. I've…

xcode encoding utf-8 utf mac-roman

asked Apr 23 '11 at 19:29

maskov1

votes

1 answer

Python: bz2 and lzma in mode 'wt' don't write the BOM (while gzip does). Why?

The following code writes a compressed text file using gzip, bz2 and lzma, then reads and prints its binary content. import bz2 import gzip import lzma import os def test(encoding): print(encoding) for module in [gzip, bz2, lzma]: …

python compression utf byte-order-mark

asked Mar 14 '19 at 20:27

janluke

1,567
1
15
19

votes

1 answer

How to create UTF characters dynamically with JavaScript

I'm trying to use a variable and \u to create an UTF character with Node.js. var code = '0045'; console.log('\u0045', '\u' + code); But the output becomes E u0045 How do I make it E E How do I create the character and store it in a variable?

javascript utf-8 node.js character utf

asked Mar 30 '11 at 20:53

tirithen

3,219
11
41
65

votes

1 answer

What is the list of python settings that affect encoding, decoding, and printing?

When I run into unicode printing problems, I want to know what I should check. In my particular case, I'm using an installed module that is printing unicode encoded characters using the wrong codec. There are several disparate places that affect…

python unicode localization python-unicode utf

asked Feb 11 '19 at 06:44

JamesThomasMoon

6,169
7
37
63

votes

1 answer

Unable to display a degree char (°) in a SVG

I'm trying to display a degree character (°) in a SVG manipulated by D3.js. So far I've tried different charsets but I'm always getting a � instead and the following character codes simply display themselves as regular text: ° or °. I'm…

html d3.js svg utf

asked Aug 30 '18 at 10:00

VincentDM

votes

1 answer

How to know number of bytes of a Binary File?

How to count number of bytes of this binary file (t.dat) without running this code (as a theoretical question) ? Assuming that you run the following program on Windows using the default ASCII encoding. public class Bin { public static void…

java binaryfiles utf dataoutputstream

asked May 05 '18 at 00:43

dave

votes

5 answers

Invalid URI with Chinese characters (Java)

Having trouble setting up a URL connection with Chinese characters in the URL. It works with Latin characters: String xstr = "维也纳恩斯特哈佩尔球场" ; URI uri = new URI("http","ajax.googleapis.com","/ajax/services/language/detect","v=1.0&q="+xstr,null); …

java url uri multibyte utf

asked Jan 28 '11 at 17:37

Joe Knapp

Prev 1 2 3

…

57 58 Next