Questions tagged [encoding]

Encoding is a set of predefined rules to reversibly transform a piece of information in a certain representation into a completely different representation. The other way round is called decoding. This tag is rather generic, but it is mainly used for binary encoding schemes such as base 64 and hexadecimal.

There are a lot of different applications:

  • which is how the computer represents characters like a and , which humans can recognize, into bytes, which computers can recognize.
  • which is used to transform between videos and bytes.
  • which is used to transform between plain text and valid URIs. Also known as .
  • which is used to transform between plain text and valid XML.
  • which is used to compress/decompress bytes.
24174 questions
190
votes
10 answers

What's the difference between encoding and charset?

I am confused about the text encoding and charset. For many reasons, I have to learn non-Unicode, non-UTF8 stuff in my upcoming work. I find the word "charset" in email headers as in "ISO-2022-JP", but there's no such a encoding in text editors. (I…
TK.
  • 27,073
  • 20
  • 64
  • 72
188
votes
6 answers

UnicodeEncodeError: 'charmap' codec can't encode - character maps to , print function

I am writing a Python (Python 3.3) program to send some data to a webpage using POST method. Mostly for debugging process I am getting the page result and displaying it on the screen using print() function. The code is like…
187
votes
4 answers

Why should we NOT use sys.setdefaultencoding("utf-8") in a py script?

I have seen few py scripts which use this at the top of the script. In what cases one should use it? import sys reload(sys) sys.setdefaultencoding("utf-8")
mlzboy
  • 14,343
  • 23
  • 76
  • 97
187
votes
12 answers

"’" showing on page instead of " ' "

’ is showing on my page instead of '. I have the Content-Type set to UTF-8 in both my tag and my HTTP headers: In addition, my browser is set to Unicode (UTF-8): So…
Jitendra Vyas
  • 148,487
  • 229
  • 573
  • 852
186
votes
12 answers

Let JSON object accept bytes or let urlopen output strings

With Python 3 I am requesting a json document from a URL. response = urllib.request.urlopen(request) The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text…
Peter Smit
  • 27,696
  • 33
  • 111
  • 170
186
votes
5 answers

Why does base64 encoding require padding if the input length is not divisible by 3?

What is the purpose of padding in base64 encoding. The following is the extract from wikipedia: "An additional pad character is allocated which may be used to force the encoded output into an integer multiple of 4 characters (or equivalently when…
Anand Patel
  • 6,031
  • 11
  • 48
  • 67
179
votes
6 answers

Correct way to define Python source code encoding

PEP 263 defines how to declare Python source code encoding. Normally, the first 2 lines of a Python file should start with: #!/usr/bin/python # -*- coding: -*- But I have seen a lot of files starting with: #!/usr/bin/python # -*-…
Oli
  • 15,345
  • 8
  • 30
  • 36
179
votes
14 answers

Changing default encoding of Python?

I have many "can't encode" and "can't decode" problems with Python when I run my applications from the console. But in the Eclipse PyDev IDE, the default character encoding is set to UTF-8, and I'm fine. I searched around for setting the default…
Ali Nadalizadeh
  • 2,726
  • 3
  • 22
  • 24
175
votes
11 answers

Difference between encoding and encryption

What is the difference between encoding and encryption?
Pankaj Agarwal
  • 11,191
  • 12
  • 43
  • 59
174
votes
4 answers

Who originally invented this type of syntax: -*- coding: utf-8 -*-

Python recognizes the following as instruction which defines file's encoding: # -*- coding: utf-8 -*- I definitely saw this kind of instructions before (-*- var: value -*-), so I assume Python did not invent them and is not the only one that uses…
hamstergene
  • 24,039
  • 5
  • 57
  • 72
170
votes
12 answers

Effective way to find any file's Encoding

Yes is a most frequent question, and this matter is vague for me and since I don't know much about it. But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.
Fábio Antunes
  • 16,984
  • 18
  • 75
  • 96
167
votes
6 answers

In OS X Lion, LANG is not set to UTF-8, how to fix it?

I try to setup postgress in OS X Lion, and find that is not correctly setup the LOCALE environment var. This is what is set: LANG= LC_COLLATE="C" LC_CTYPE="C" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL= I expect something…
mamcx
  • 15,916
  • 26
  • 101
  • 189
163
votes
10 answers

How to achieve Base64 URL safe encoding in C#?

I want to achieve Base64 URL safe encoding in C#. In Java, we have the common Codec library which gives me an URL safe encoded string. How can I achieve the same using C#? byte[] toEncodeAsBytes =…
Vishvesh Phadnis
  • 2,448
  • 5
  • 19
  • 35
160
votes
7 answers

Does C# have an equivalent to JavaScript's encodeURIComponent()?

In JavaScript: encodeURIComponent("©√") == "%C2%A9%E2%88%9A" Is there an equivalent for C# applications? For escaping HTML characters I used: txtOut.Text = Regex.Replace(txtIn.Text, @"[\u0080-\uFFFF]", m => @"&#" + ((int)m.Value[0]).ToString()…
travis
  • 35,751
  • 21
  • 71
  • 94
158
votes
16 answers

Java : How to determine the correct charset encoding of a stream

With reference to the following thread: Java App : Unable to read iso-8859-1 encoded file correctly What is the best way to programatically determine the correct charset encoding of an inputstream/file ? I have tried using the following: File in = …
Joel
  • 29,538
  • 35
  • 110
  • 138