ASCII stands for 'American Standard Code for Information Interchange'. ASCII is a character-encoding scheme based on the ordering of the English alphabet. Since ASCII only contains definitions for 128 characters, numerous other encoding schemes have been created to include characters from other alphabets and other symbols.
Questions tagged [non-ascii-characters]
1055 questions
33
votes
2 answers
Replace accented characters in R with non-accented counterpart (UTF-8 encoding)
I have some strings in R in UTF-8 encoding that contain accents.
E.g.
string="Hølmer" or string="Elizalde-González"
Is there any nice function in R to replace the accented characters in these strings by their unaccented counterpart? I saw some…

Tom Wenseleers
- 7,535
- 7
- 63
- 103
33
votes
5 answers
R on Windows: character encoding hell
I am trying to import a CSV encoded as OEM-866 (Cyrillic charset) into R on Windows. I also have a copy that has been converted into UTF-8 w/o BOM. Both of these files are readable by all other applications on my system, once the encoding is…

user27636
- 1,070
- 1
- 18
- 26
32
votes
3 answers
matching unicode characters in python regular expressions
I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work.
>>> import re
>>> m =…

Weholt
- 1,889
- 5
- 22
- 35
32
votes
5 answers
How to account for accent characters for regex in Python?
I currently use re.findall to find and isolate words after the '#' character for hash tags in a string:
hashtags = re.findall(r'#([A-Za-z0-9_]+)', str1)
It searches str1 and finds all the hashtags. This works however it doesn't account for accented…

deadlock
- 7,048
- 14
- 67
- 115
32
votes
1 answer
How do I get accented letters to actually work on bash?
My bash installation on cygwin doesn't handle accented letters properly. I tried adding
set input-meta on # to accept 8-bit characters
set output-meta on # to show 8-bit characters
set convert-meta on # to show it as character, not the octal…

Ferdinando Randisi
- 4,068
- 6
- 32
- 43
27
votes
4 answers
"UnicodeEncodeError: 'ascii' codec can't encode character"
I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this:
UnicodeEncodeError: 'ascii' codec can't encode character
I traced it back to a trademark superscript on the end of this word:…

KenBurnsFan1
- 575
- 1
- 9
- 17
26
votes
2 answers
PyYaml - Dump unicode with special characters ( i.e. accents )
I'm working with yaml files that have to be human readable and editable but that will also be edited from Python code.
I'm using Python 2.7.3
The file needs to handle accents ( mostly to handle text in French ).
Here is a sample of my issue:
import…

Hans Baldzuhn
- 317
- 1
- 3
- 9
25
votes
10 answers
Copyleft symbol
Is there any easy way to print the copyleft symbol?
https://en.wikipedia.org/wiki/Copyleft
For example as simple as:
© ©
It might be:
&anticopy; &anticopy;

Evhz
- 8,852
- 9
- 51
- 69
24
votes
1 answer
What's the character code for exclamation mark in circle?
What's the Unicode or Segoe UI Symbols (or other font) code for exclamation mark in circle?

Waldemar Gałęzinowski
- 1,125
- 1
- 10
- 18
24
votes
4 answers
Convert Hi-Ansi chars to Ascii equivalent (é -> e)
Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)?
I know some chars cannot translate well but most can,…

Francesca
- 21,452
- 4
- 49
- 90
24
votes
7 answers
Remove non-ASCII non-printable characters from a String
I get user input including non-ASCII characters and non-printable characters, such as
\xc2d
\xa0
\xe7
\xc3\ufffdd
\xc3\ufffdd
\xc2\xa0
\xc3\xa7
\xa0\xa0
for example:
email : abc@gmail.com\xa0\xa0
street : 123 Main St.\xc2\xa0
desired output:
…

daydreamer
- 87,243
- 191
- 450
- 722
23
votes
3 answers
How to replace accented characters?
My output looks like 'àéêöhello!'. I need change my output like this 'aeeohello', Just replacing the character à as a like this.

Ganesh Basuvaraj
- 231
- 1
- 2
- 3
23
votes
6 answers
How to ignore acute accent in a javascript regex match?
I need to match a word like 'César' for a regex like this /^cesar/i.
Is there an option like /i to configure the regex so it ignores the acute accents?.
Or the only solution is to use a regex like this /^césar/i.

sanrodari
- 1,602
- 2
- 13
- 23
22
votes
2 answers
How to MySQL work "case insensitive" and "accent insensitive" in UTF-8
I have a schema in "utf8 -- UTF-8 Unicode" as charset and a collation of "utf8_spanish_ci".
All the inside tables are InnoDB with same charset and collation as mentioned.
Here comes the problem:
with a query like
SELECT *
FROM people p
WHERE p.NAME…

Lightworker
- 593
- 1
- 5
- 18
21
votes
7 answers
Regex accent insensitive?
I need a Regex in a C# program.
I've to capture a name of a file with a specific structure.
I used the \w char class, but the problem is that this class doesn't match any accented char.
Then how to do this? I just don't want to put the most used…

J4N
- 19,480
- 39
- 187
- 340