Questions tagged [non-ascii-characters]

ASCII stands for 'American Standard Code for Information Interchange'. ASCII is a character-encoding scheme based on the ordering of the English alphabet. Since ASCII only contains definitions for 128 characters, numerous other encoding schemes have been created to include characters from other alphabets and other symbols.

1055 questions
4
votes
3 answers

Non ASCII character error python

I'm a beginner programmer trying to write a python script that will generate random passwords. However, I always get a non-ASCII character error even though I declared the coding #utf-8, as mentioned in another similar question here in Stack…
chilliefiber
  • 571
  • 2
  • 7
  • 18
4
votes
2 answers

Translate Unicode Literal in Qt 5.3

In a Qt 5.3 application, I have a string literal that contains non-ASCII characters (specifically German Umlauts) that will need to be translated into foreign languages. So I have two issues: (1) I have to mark that literal with tr() and (2) I have…
4
votes
1 answer

Tricky special single quote symbol

I observed that there is a difference between the plain single quote ' and single quote in the word document ’. I tried to find the ASCII values from an online ASCII value finder for both the letters. I can find the ASCII value for the first one,…
Azeez
  • 318
  • 3
  • 14
4
votes
1 answer

lua - string.byte for non ascii characters

I want to convert characters to numerical codes, so I tried string.byte("å"). However, it seems that the return value of string.byte() is 195 for these kind of characters; any way to get a numerical code of non-ascii characters…
wiki
  • 1,877
  • 2
  • 31
  • 47
4
votes
1 answer

ignore accents in elastic search with haystack

I am using elasticsearch along with haystack in order to provide search. I want user to search in language other than english. E.g. currently trying with Greek. How can I ignore the accents while searching for anything. E.g. let's say if I enter…
4
votes
2 answers

Regular expression - PCRE (PHP) - word boundary (\b) and accent characters

Why does the letter é count as a word boundary matching \b in the following example? Pattern: /\b(cum)\b/i Text: écumé Matches 'cum' which is not desired. Is it possible to overcome this?
marekful
  • 14,986
  • 6
  • 37
  • 59
4
votes
2 answers

Detecting accents in words (Python)

Here's the dealio: I've written a program that finds all of the algorithm classes in the dictionary. However, I'm having a problem dealing with accented characters. Currently my code reads them in, treats them like they're invisible, but still…
4
votes
1 answer

Encode extended ASCII characters in a Code 128 barcode

I want to encode the string "QuiÑones" in a Code 128 bar code. Is it possible to include extended ASCII characters in the Code 128 encoding? . I did some research on Google which suggested that it is possible by using FNC4, but I didn't find…
Mayuresh
  • 43
  • 1
  • 5
4
votes
1 answer

Perl Text::Unaccent has unexpected results

I'm experiencing some weird, system dependent issues with the Text::Unaccent module. Apologies if I'm missing something silly, but I've been banging my head against this one for hours with no real progress. I have a simple script set up that shows…
4
votes
1 answer

getting a sub string of a std::wstring

How can I get a substring of a std::wstring which includes some non-ASCII characters? The following code does not output anything: (The text is an Arabic word contains 4 characters where each character has two bytes, plus the word "Hello") #include…
MBZ
  • 26,084
  • 47
  • 114
  • 191
4
votes
2 answers

Handling Non-Ascii Chars in C++

I am facing some issues with non-Ascii chars in C++. I have one file containg non-ascii chars which I am reading in C++ via file Handling. After reading the file(say 1.txt) I am storing the data into string stream and writing it into another…
Mayank Jain
  • 2,504
  • 9
  • 33
  • 52
4
votes
1 answer

UnicodeEncodeError: 'ascii' codec can't encode character?

I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this: UnicodeEncodeError: 'ascii' codec can't encode character I traced it back to a trademark superscript on the end of this word:…
KenBurnsFan1
  • 575
  • 1
  • 9
  • 17
4
votes
3 answers

From a Java program, portable way to write strings with accents

Hullo, I have a Java program, with command line interface. It is used on Linux and Windows. The Java code is portable, and I want it to remain portable. My Java source files are in Unicode — which is good. In them, I have lines like…
4
votes
1 answer

Detect Japanese character input and "Romajis" (ASCII)

I would like to be able to detect when the user: Inputs Japanese characters (Kanji or Kana) Inputs Roman characters (exclusively) Currently I am using the ASCII range like this (C# syntax): string searchKeyWord = Console.ReadLine(); var romajis =…
Julien
  • 353
  • 3
  • 11
4
votes
4 answers

Python regex to match non-ascii names

I'm trying to validate name fields with the re module. \w doesn't match non-ascii chars such as à. It seems that in many other regex engines, the solution would have been \p{L}, but this isn't supported in python as it appears. What would be a…
GJ.
  • 5,226
  • 13
  • 59
  • 82