Questions tagged [encoding]

Encoding is a set of predefined rules to reversibly transform a piece of information in a certain representation into a completely different representation. The other way round is called decoding. This tag is rather generic, but it is mainly used for binary encoding schemes such as base 64 and hexadecimal.

There are a lot of different applications:

  • which is how the computer represents characters like a and , which humans can recognize, into bytes, which computers can recognize.
  • which is used to transform between videos and bytes.
  • which is used to transform between plain text and valid URIs. Also known as .
  • which is used to transform between plain text and valid XML.
  • which is used to compress/decompress bytes.
24174 questions
308
votes
16 answers

How to determine the encoding of text

I received some text that is encoded, but I don't know what charset was used. Is there a way to determine the encoding of a text file using Python? How can I detect the encoding/codepage of a text file deals with C#.
Nope
  • 34,682
  • 42
  • 94
  • 119
293
votes
13 answers

How to convert Strings to and from UTF8 byte arrays in Java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?
mcherm
  • 23,999
  • 10
  • 44
  • 50
292
votes
12 answers

How many bytes does one Unicode character take?

I am a bit confused about encodings. As far as I know old ASCII characters took one byte per character. How many bytes does a Unicode character require? I assume that one Unicode character can contain every possible character from any language - am…
nan
  • 19,595
  • 7
  • 48
  • 80
275
votes
7 answers

How to store custom objects in NSUserDefaults

Alright, so I've been doing some poking around, and I realize my problem, but I don't know how to fix it. I have made a custom class to hold some data. I make objects for this class, and I need to them to last between sessions. Before I was…
Ethan Mick
  • 9,517
  • 14
  • 58
  • 74
244
votes
8 answers

HTML encoding issues - "Â" character showing up instead of " "

I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF. The process works like this: Pull an HTML template from a DB with tokens in it to be…
Cᴏʀʏ
  • 105,112
  • 20
  • 162
  • 194
243
votes
5 answers

In HTML I can make a checkmark with ✓ . Is there a corresponding X-mark?

Is there a corresponding X mark to ✓ (✓)? What is it?
nc.
  • 7,179
  • 5
  • 28
  • 38
236
votes
6 answers

Does a `+` in a URL scheme/host/path represent a space?

I am aware that a + in the query string of a URL represents a space. Is this also the case outside of the query string region? That is to say, does the following URL: http://a.com/a+b/c actually represent: http://a.com/a b/c (and thus need to be…
231
votes
14 answers

Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

I'm using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I'm getting ASCII-encoded characters back (like ⢠instead of ™). How can I properly handle…
brandonscript
  • 68,675
  • 32
  • 163
  • 220
218
votes
6 answers

Does "\d" in regex mean a digit?

I found that in 123, \d matches 1 and 3 but not 2. I was wondering if \d matches a digit satisfying what kind of requirement? I am talking about Python style regex. Regular expression plugin in Gedit is using Python style regex. I created a text…
Tim
  • 1
  • 141
  • 372
  • 590
217
votes
12 answers

Attempt to set a non-property-list object as an NSUserDefaults

I thought I knew what was causing this error, but I can't seem to figure out what I did wrong. Here is the full error message I am getting: Attempt to set a non-property-list object ( "" ) as an NSUserDefaults value for key…
icekomo
  • 9,328
  • 7
  • 31
  • 59
213
votes
8 answers

Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?

I have a PHP script that deals with a wide variety of languages. Unfortunately, whenever I try to use json_encode, any Unicode output is converted to hexadecimal entities. Is this the expected behavior? Is there any way to convert the output to…
David Jones
  • 10,117
  • 28
  • 91
  • 139
210
votes
14 answers

How do I determine file encoding in OS X?

I'm trying to enter some UTF-8 characters into a LaTeX file in TextMate (which says its default encoding is UTF-8), but LaTeX doesn't seem to understand them. Running cat my_file.tex shows the characters properly in Terminal. Running ls -al shows…
James A. Rosen
  • 64,193
  • 61
  • 179
  • 261
204
votes
22 answers

Microsoft Excel mangles Diacritics in .csv files?

I am programmatically exporting data (using PHP 5.2) into a .csv test file. Example data: Numéro 1 (note the accented e). The data is utf-8 (no prepended BOM). When I open this file in MS Excel is displays as Numéro 1. I am able to open this in a…
Freddo411
  • 2,293
  • 3
  • 18
  • 17
193
votes
10 answers

"TypeError: (Integer) is not JSON serializable" when serializing JSON in Python?

I am trying to send a simple dictionary to a json file from python, but I keep getting the "TypeError: 1425 is not JSON serializable" message. import json alerts = {'upper':[1425],'lower':[576],'level':[2],'datetime':['2012-08-08 15:30']} afile =…
user1329894
  • 5,027
  • 4
  • 15
  • 7
192
votes
7 answers

How can I transform string to UTF-8 in C#?

I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface. Due to incorrect encoding, a piece of my string looks like this in Spanish: Acción whereas it should…
Gaara
  • 2,117
  • 2
  • 15
  • 15