Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a character-encoding that can describe the set of unicode code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on linux and other recent unix-like operating systems. It was designed to be backwards-compatible with ascii while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

The character-encoding tag discusses the general concept of character-set encodings
The unicode character set can be represented in a variety of encodings, one of which is UTF-8
The ascii character set and encoding it generalizes
Other UTFs: utf-16 utf-32, rarely used: utf-7 utf-1 utf-18 utf-36 utf8mb4

22178 questions

votes

5 answers

Arabic Character Encoding Issue: UTF-8 versus Windows-1256

Quick Background: I inherited a large sql dump file containing a combination of english and arabic text and (I think) it was originally exported using 'latin1'. I changed all occurrences of 'latin1' to 'utf8' prior to importing the file. The the…

php database utf-8 character-encoding

asked Dec 29 '11 at 22:14

ThisLanham

votes

1 answer

UTF-8 encoding a servlet form submission with Tomcat

I'm attempting to post a simple form that includes unicode characters to a servlet action. On Jetty, everything works without a snag. On a Tomcat server, utf-8 characters get mangled. The simplest case I've got: Form:

votes

1 answer

How fix double encoding in PostgreSQL?

I have a table in PostgreSQL with words, but some words have invalid UTF-8 chars like 0xe7e36f and 0xefbfbd. How I can identify all chars inside words that are invalid and replace they with some symbol like ?? EDIT: My database is in UTF-8, but I…

sql postgresql encoding utf-8

asked Nov 18 '11 at 16:51

Renato Dinhani

35,057
55
139
199

votes

1 answer

HTML validation error: Non-space characters found before DOCTYPE

I have a blog(wordpress based). And try to validate by w3c validator one of my page. The first error is: Line 1, Column 1: Non-space characters found without seeing a doctype first. Expected . Also,…

html wordpress utf-8 w3c-validation byte-order-mark

asked Nov 08 '11 at 14:29

Smarty

1,579
2
11
16

votes

1 answer

node.js and utf-8 in POST data

I am having problems decoding UTF-8 strings in POST data when using the Node.JS web server. See this complete testcase: require("http").createServer(function(request, response) { if (request.method != "POST") { response.writeHead(200,…

post node.js utf-8

asked Oct 18 '11 at 12:08

Udo G

12,572
13
56
89

votes

2 answers

How to use UTF-8 in PDFKit in Rails?

I'm using PDFKit in my Rails app to generate PDF's. Problem is some of my content in contains non-ascii characters. How do I force it to use UTF-8?

ruby-on-rails ruby-on-rails-3 utf-8 wkhtmltopdf pdfkit

asked Sep 09 '11 at 20:40

tybro0103

48,327
33
144
170

votes

1 answer

Multer corrupts UTF8 filename when uploading files

What is the proper way to POST a file with a UTF-8 filename to Multer using the axios http client? Chrome seems to be sending a correctly encoded payload for the multipart/form-data body

utf-8 nestjs multer

asked Jul 08 '22 at 09:50

Marc Kornberger

votes

1 answer

Getting SQLPlus to spool out Unicode characters, are being output as?

I am attempting to get Oracle sqlplus (10.2) to spool out Unicode data on a Linux machine. I have found several discussions of this issue, but no clear answers, other than to check locale settings and set NLS_LANG to AL32UTF8. All locale info is set…

oracle utf-8 oracle10g sqlplus

asked Aug 24 '11 at 16:46

Todd Allen

votes

4 answers

c++, cout and UTF-8

Hopefully a simple question: cout seems to die when handling strings that end with a multibyte UTF-8 char, am I doing something wrong? This is with GCC (Mingw) on Win7 x64. **Edit Sorry if I wasn't clear enough, I'm not concerned about the missing…

c++ utf-8 cout

asked Aug 05 '11 at 09:03

user657267

20,568
5
58
77

votes

1 answer

Running a rails migration overwrites my charset. Any ideas why?

I have everything set to utf8mb4 in my DB : mysql> show variables like "%character%";show variables like "%collation%"; +--------------------------+----------------------------+ | Variable_name | Value …

mysql ruby-on-rails utf-8 collation utf8mb4

asked Feb 26 '21 at 14:42

Trip

26,756
46
158
277

votes

5 answers

Java UTF-8 differences

The JavaDoc says "The null byte '\u0000' is encoded in 2-byte format rather than 1-byte, so that the encoded strings never have embedded nulls." But what does this even mean? What's an embedded null in this context? I am trying to convert from a…

java utf-8

asked Jun 22 '11 at 12:24

Prof. Falken

24,226
19
100
173

votes

8 answers

UTF-8 does not print characters to the console

I have the following code public class MainDefault { public static void main (String[] args) { System.out.println("²³"); System.out.println(Arrays.toString("²³".getBytes())); } } But can't seem to…

java encoding utf-8 compilation character-encoding

asked Sep 02 '20 at 19:05

Yassin Hajaj

21,337
9
51
89

votes

5 answers

should I eliminate TCHAR from Windows code?

I am revising some very old (10 years) C code. The code compiles on Unix/Mac with GCC and cross-compiles for Windows with MinGW. Currently there are TCHAR strings throughout. I'd like to get rid of the TCHAR and use a C++ string instead. Is it still…

c winapi unicode utf-8 tchar

asked Jun 11 '11 at 11:16

vy32

28,461
37
122
246

votes

2 answers

MongoDB SpiderMonkey doesn't understand UTF-8

If I add non-ASCII characters to MongoDB database then all db.find() fail telling "non ascii character detected". It's problem of SpiderMonkey, I have to rebuild it with UTF-8 support. I've tried to do it like…

mongodb utf-8 v8 spidermonkey

asked Jun 07 '11 at 16:00

luchaninov

6,792
6
60
75

votes

2 answers

Does C++0x support std::wstring conversion to/from UTF-8 byte sequence?

I saw that C++0x will add support for UTF-8, UTF-16 and UTF-32 literals. But what about conversions between the three representations ? I plan to use std::wstring everywhere in my code. But I also need to manipulate UTF-8 encoded data when dealing…

c++ c++11 unicode utf-8 wstring

asked Mar 07 '09 at 10:25

chmike

Prev 1 2 3

…

99 100 Next