Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a that can describe the set of code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on and other recent -like operating systems. It was designed to be backwards-compatible with while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

22178 questions
8
votes
5 answers

Arabic Character Encoding Issue: UTF-8 versus Windows-1256

Quick Background: I inherited a large sql dump file containing a combination of english and arabic text and (I think) it was originally exported using 'latin1'. I changed all occurrences of 'latin1' to 'utf8' prior to importing the file. The the…
ThisLanham
  • 745
  • 3
  • 8
  • 20
8
votes
1 answer

UTF-8 encoding a servlet form submission with Tomcat

I'm attempting to post a simple form that includes unicode characters to a servlet action. On Jetty, everything works without a snag. On a Tomcat server, utf-8 characters get mangled. The simplest case I've got: Form:
Parker
  • 7,949
  • 5
  • 26
  • 21
8
votes
1 answer

How fix double encoding in PostgreSQL?

I have a table in PostgreSQL with words, but some words have invalid UTF-8 chars like 0xe7e36f and 0xefbfbd. How I can identify all chars inside words that are invalid and replace they with some symbol like ?? EDIT: My database is in UTF-8, but I…
Renato Dinhani
  • 35,057
  • 55
  • 139
  • 199
8
votes
1 answer

HTML validation error: Non-space characters found before DOCTYPE

I have a blog(wordpress based). And try to validate by w3c validator one of my page. The first error is: Line 1, Column 1: Non-space characters found without seeing a doctype first. Expected . Also,…
Smarty
  • 1,579
  • 2
  • 11
  • 16
8
votes
1 answer

node.js and utf-8 in POST data

I am having problems decoding UTF-8 strings in POST data when using the Node.JS web server. See this complete testcase: require("http").createServer(function(request, response) { if (request.method != "POST") { response.writeHead(200,…
Udo G
  • 12,572
  • 13
  • 56
  • 89
8
votes
2 answers

How to use UTF-8 in PDFKit in Rails?

I'm using PDFKit in my Rails app to generate PDF's. Problem is some of my content in contains non-ascii characters. How do I force it to use UTF-8?
tybro0103
  • 48,327
  • 33
  • 144
  • 170
8
votes
1 answer

Multer corrupts UTF8 filename when uploading files

What is the proper way to POST a file with a UTF-8 filename to Multer using the axios http client? Chrome seems to be sending a correctly encoded payload for the multipart/form-data body
Marc Kornberger
  • 251
  • 2
  • 7
8
votes
1 answer

Getting SQLPlus to spool out Unicode characters, are being output as?

I am attempting to get Oracle sqlplus (10.2) to spool out Unicode data on a Linux machine. I have found several discussions of this issue, but no clear answers, other than to check locale settings and set NLS_LANG to AL32UTF8. All locale info is set…
Todd Allen
  • 648
  • 1
  • 6
  • 18
8
votes
4 answers

c++, cout and UTF-8

Hopefully a simple question: cout seems to die when handling strings that end with a multibyte UTF-8 char, am I doing something wrong? This is with GCC (Mingw) on Win7 x64. **Edit Sorry if I wasn't clear enough, I'm not concerned about the missing…
user657267
  • 20,568
  • 5
  • 58
  • 77
8
votes
1 answer

Running a rails migration overwrites my charset. Any ideas why?

I have everything set to utf8mb4 in my DB : mysql> show variables like "%character%";show variables like "%collation%"; +--------------------------+----------------------------+ | Variable_name | Value …
Trip
  • 26,756
  • 46
  • 158
  • 277
8
votes
5 answers

Java UTF-8 differences

The JavaDoc says "The null byte '\u0000' is encoded in 2-byte format rather than 1-byte, so that the encoded strings never have embedded nulls." But what does this even mean? What's an embedded null in this context? I am trying to convert from a…
Prof. Falken
  • 24,226
  • 19
  • 100
  • 173
8
votes
8 answers

UTF-8 does not print characters to the console

I have the following code public class MainDefault { public static void main (String[] args) { System.out.println("²³"); System.out.println(Arrays.toString("²³".getBytes())); } } But can't seem to…
Yassin Hajaj
  • 21,337
  • 9
  • 51
  • 89
8
votes
5 answers

should I eliminate TCHAR from Windows code?

I am revising some very old (10 years) C code. The code compiles on Unix/Mac with GCC and cross-compiles for Windows with MinGW. Currently there are TCHAR strings throughout. I'd like to get rid of the TCHAR and use a C++ string instead. Is it still…
vy32
  • 28,461
  • 37
  • 122
  • 246
8
votes
2 answers

MongoDB SpiderMonkey doesn't understand UTF-8

If I add non-ASCII characters to MongoDB database then all db.find() fail telling "non ascii character detected". It's problem of SpiderMonkey, I have to rebuild it with UTF-8 support. I've tried to do it like…
luchaninov
  • 6,792
  • 6
  • 60
  • 75
8
votes
2 answers

Does C++0x support std::wstring conversion to/from UTF-8 byte sequence?

I saw that C++0x will add support for UTF-8, UTF-16 and UTF-32 literals. But what about conversions between the three representations ? I plan to use std::wstring everywhere in my code. But I also need to manipulate UTF-8 encoded data when dealing…
chmike