Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a that can describe the set of code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on and other recent -like operating systems. It was designed to be backwards-compatible with while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

22178 questions
8
votes
1 answer

Java 8 change in UTF-8 decoding

We recently migrated our application to JDK 8 from JDK 7. After the change, we ran into a problem with the following snippet of code. String output = new String(byteArray, "UTF-8"); The byte array may contain invalid UTF-8 byte sequences. The same…
Jiraiya
  • 336
  • 2
  • 8
8
votes
4 answers

Laravel 5 charset not working correctly on the views. But it working well when I dump it from controller

I'm facing a charset problem here. I'm developing an app that uses a sql server database. The database was not created for this app, it exists before it and works very well. I can't change anything on the database because its too large and its used…
Anderson Silva
  • 709
  • 1
  • 7
  • 31
8
votes
3 answers

How to send UTF-8 encoded email body with JavaMailSenderImpl?

I am sending an email this way: @Test public void testEmailCharacterSet() throws MessagingException, UnsupportedEncodingException { JavaMailSenderImpl mailSender = new JavaMailSenderImpl(); mailSender.setDefaultEncoding("utf-8"); …
jabal
  • 11,987
  • 12
  • 51
  • 99
8
votes
3 answers

Default encoding of HTTP POST request with JSON body

What's the default encoding of HTTP POST request when the content-type is "application/json" with no explicit charset given"? It seems two specs are in conflicts: JSON spec says that "JSON text SHALL be encoded in Unicode. The default encoding is…
Kwang Yul Seo
  • 771
  • 2
  • 7
  • 15
8
votes
3 answers

Migrating MySQL UTF8 to UTF8MB4 problems and questions

Im trying to convert my UTF8 MySQL 5.5.30 database to UTF8MB4. I have looked at this article https://mathiasbynens.be/notes/mysql-utf8mb4 but have some questions. I have done these ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE =…
Banshee
  • 15,376
  • 38
  • 128
  • 219
8
votes
3 answers

php reading mysql bit field returning weird character

I am using mysql_fetch_assoc($query), one of the bit field returns out to be , which is supposedly to be true. The problem is that I also need to output this to xml and it's an illegal xml character. the charset for the db table is utf-8. why does…
user121196
  • 30,032
  • 57
  • 148
  • 198
8
votes
3 answers

How to handle example data in R Package that has UTF-8 marked strings

I would like to include an example dataset (of Twitter tweets and metadata) in an R Package I'm writing. I downloaded an example data.frame using the Twitter API and saved it as .RData (with the corresponding .R data description file) in my…
Rocinante
  • 625
  • 6
  • 15
8
votes
3 answers

Publishing DACPAC file with MSDeploy, UTF8 characters in post-deploy script are lost

I have a DACPAC file that was built in Visual Studio 2013, for an SSDT project. This SSDT project defines a post-deploy script designed to merge some static data into the published tables, and one piece of data contains a copyright symbol. Now,…
8
votes
2 answers

Setting shell script to utf8

I want to write to following command line into a shell script: cat text.tsv | grep -Pvi '.\t.\t.*\bHotels|Гостиница|Готель|Отель|Хотел|ホテル|מלון|فندق|होटल|โรงแรม|숙박|호텔|宾馆|旅店|旅馆|酒店|飯店\b' | awk '{print $0,"\t","column1"} > Text2.tsv However when I put…
Vic23
  • 103
  • 1
  • 1
  • 4
8
votes
2 answers

How to easily detect utf8 encoding in the string?

I have string which fill up by data from other program and this data can be with UTF8 encoding or not. So if not i can encode to UTF8 but what is the best way to detect UTF8 in the C++? I saw this variant https://stackoverflow.com/questions/... but…
ratojakuf
  • 708
  • 1
  • 11
  • 21
8
votes
3 answers

multi-byte characters in libc regcomp and regexec

Is there anyway to get libc6's regexp functions regcomp and regexec to work properly with multi-byte characters? For instance, if my pattern is the utf8 characters 猫机+猫, finding a match on the utf8 encoded string 猫机机机猫 will fail, where it should…
bill_e
  • 930
  • 2
  • 12
  • 24
8
votes
2 answers

How to convert any possible format to UTF-8 using Iconv?

so for example this will turn 1251 into utf-8. $utf8 = iconv('windows-1251', 'utf-8', $ansi); But how to turn unknown (when it comes to us we do not know yet what format it is) ( in general any ) format (possibly known by Iconv ) to utf-8? (code…
Rella
  • 65,003
  • 109
  • 363
  • 636
8
votes
5 answers

Remove characters not-suitable for UTF-8 encoding from String

I have a text-area on website where user can write anything. Problem happens when user copy paste some text or something which contains non-UTF 8 characters and submit them to server. Java successfully handles it, as it support UTF-16 but my mySql…
Abhinav
  • 3,322
  • 9
  • 47
  • 63
8
votes
1 answer

Django UnicodeEncodeError in rendering form ('utf-8')

I got an UnicodeEncodeError while rendering page using forms. UnicodeEncodeError at /individual/acc/ 'ascii' codec can't encode character u'\u0142' in position 2: ordinal not in range(128) Here's fragment of HTML (standard use of forms): …
Mateusz Knapczyk
  • 276
  • 1
  • 2
  • 15
8
votes
3 answers

Open a text file with accents in python

I try to open a text file in French with Python 2.7. I used the command f=open('textfr','r') but when I use f.read() I lose accented characters: I get u"J'\xc3\xa9tais \xc3\xa0 Paris instead of J'étais à Paris, etc.. when in linux terminal, I do…
Mostafa
  • 1,501
  • 3
  • 21
  • 37
1 2 3
99
100