Questions tagged [non-ascii-characters]

ASCII stands for 'American Standard Code for Information Interchange'. ASCII is a character-encoding scheme based on the ordering of the English alphabet. Since ASCII only contains definitions for 128 characters, numerous other encoding schemes have been created to include characters from other alphabets and other symbols.

1055 questions
33
votes
2 answers

Replace accented characters in R with non-accented counterpart (UTF-8 encoding)

I have some strings in R in UTF-8 encoding that contain accents. E.g. string="Hølmer" or string="Elizalde-González" Is there any nice function in R to replace the accented characters in these strings by their unaccented counterpart? I saw some…
Tom Wenseleers
  • 7,535
  • 7
  • 63
  • 103
33
votes
5 answers

R on Windows: character encoding hell

I am trying to import a CSV encoded as OEM-866 (Cyrillic charset) into R on Windows. I also have a copy that has been converted into UTF-8 w/o BOM. Both of these files are readable by all other applications on my system, once the encoding is…
user27636
  • 1,070
  • 1
  • 18
  • 26
32
votes
3 answers

matching unicode characters in python regular expressions

I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work. >>> import re >>> m =…
Weholt
  • 1,889
  • 5
  • 22
  • 35
32
votes
5 answers

How to account for accent characters for regex in Python?

I currently use re.findall to find and isolate words after the '#' character for hash tags in a string: hashtags = re.findall(r'#([A-Za-z0-9_]+)', str1) It searches str1 and finds all the hashtags. This works however it doesn't account for accented…
deadlock
  • 7,048
  • 14
  • 67
  • 115
32
votes
1 answer

How do I get accented letters to actually work on bash?

My bash installation on cygwin doesn't handle accented letters properly. I tried adding set input-meta on # to accept 8-bit characters set output-meta on # to show 8-bit characters set convert-meta on # to show it as character, not the octal…
Ferdinando Randisi
  • 4,068
  • 6
  • 32
  • 43
27
votes
4 answers

"UnicodeEncodeError: 'ascii' codec can't encode character"

I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this: UnicodeEncodeError: 'ascii' codec can't encode character I traced it back to a trademark superscript on the end of this word:…
KenBurnsFan1
  • 575
  • 1
  • 9
  • 17
26
votes
2 answers

PyYaml - Dump unicode with special characters ( i.e. accents )

I'm working with yaml files that have to be human readable and editable but that will also be edited from Python code. I'm using Python 2.7.3 The file needs to handle accents ( mostly to handle text in French ). Here is a sample of my issue: import…
Hans Baldzuhn
  • 317
  • 1
  • 3
  • 9
25
votes
10 answers

Copyleft symbol

Is there any easy way to print the copyleft symbol? https://en.wikipedia.org/wiki/Copyleft For example as simple as: © © It might be: &anticopy; &anticopy;
Evhz
  • 8,852
  • 9
  • 51
  • 69
24
votes
1 answer

What's the character code for exclamation mark in circle?

What's the Unicode or Segoe UI Symbols (or other font) code for exclamation mark in circle?
Waldemar Gałęzinowski
  • 1,125
  • 1
  • 10
  • 18
24
votes
4 answers

Convert Hi-Ansi chars to Ascii equivalent (é -> e)

Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)? I know some chars cannot translate well but most can,…
Francesca
  • 21,452
  • 4
  • 49
  • 90
24
votes
7 answers

Remove non-ASCII non-printable characters from a String

I get user input including non-ASCII characters and non-printable characters, such as \xc2d \xa0 \xe7 \xc3\ufffdd \xc3\ufffdd \xc2\xa0 \xc3\xa7 \xa0\xa0 for example: email : abc@gmail.com\xa0\xa0 street : 123 Main St.\xc2\xa0 desired output: …
daydreamer
  • 87,243
  • 191
  • 450
  • 722
23
votes
3 answers

How to replace accented characters?

My output looks like 'àéêöhello!'. I need change my output like this 'aeeohello', Just replacing the character à as a like this.
Ganesh Basuvaraj
  • 231
  • 1
  • 2
  • 3
23
votes
6 answers

How to ignore acute accent in a javascript regex match?

I need to match a word like 'César' for a regex like this /^cesar/i. Is there an option like /i to configure the regex so it ignores the acute accents?. Or the only solution is to use a regex like this /^césar/i.
22
votes
2 answers

How to MySQL work "case insensitive" and "accent insensitive" in UTF-8

I have a schema in "utf8 -- UTF-8 Unicode" as charset and a collation of "utf8_spanish_ci". All the inside tables are InnoDB with same charset and collation as mentioned. Here comes the problem: with a query like SELECT * FROM people p WHERE p.NAME…
Lightworker
  • 593
  • 1
  • 5
  • 18
21
votes
7 answers

Regex accent insensitive?

I need a Regex in a C# program. I've to capture a name of a file with a specific structure. I used the \w char class, but the problem is that this class doesn't match any accented char. Then how to do this? I just don't want to put the most used…
J4N
  • 19,480
  • 39
  • 187
  • 340
1
2
3
70 71