Questions tagged [non-ascii-characters]

ASCII stands for 'American Standard Code for Information Interchange'. ASCII is a character-encoding scheme based on the ordering of the English alphabet. Since ASCII only contains definitions for 128 characters, numerous other encoding schemes have been created to include characters from other alphabets and other symbols.

1055 questions
4
votes
2 answers

Reading accented filenames in R using list.files

I am reading county geojson files provided here into R Studio (R 3.1, Windows 8) for each of the states. I am using list.files() function in R. For state PR, which has many counties with accented (Spanish) names viz. Bayamón.geo.json,…
Shekhar Sahu
  • 504
  • 1
  • 6
  • 19
4
votes
0 answers

Apache Camel not handling properly non ASCII character in SOAP request

I do a request against an Apache Camel. It is used as a pass-through to a WS. The request:
tony
  • 41
  • 2
4
votes
2 answers

Remove lines containing non-ASCII characters from a file in Perl

I have a file with aprox 12,000 lines generated every 6 hours. On some of these lines, there are non-ascii characters. I would like to be able to run a Perl script to remove all lines that have non-ASCII characters in it.
4
votes
1 answer

Express routing non ascii characters (Farsi)

I am trying to use this route http://localhost:3030/api/words/عشق in my express app, so I can match the word in the dictionary. The browser changes the url to http://localhost:3030/api/words/%D8%B9%D8%B4%D9%82 but I have written a small middleware…
4
votes
1 answer

String fails to paste in the new iPython 5.0

In the python 2.7 console, as well as iPython 4, I was able to paste this string into a variable like so: In [2]: c = 'ÙjÌÉñõµÏ“JÖq´ž#»&•¼²nËòQZ<_' Subsequently I could type: In [3]: print(c) and it would return ÙjÌÉñõµÏ“JÖq´ž#»&•¼²nËòQZ<_ However,…
lebca
  • 185
  • 2
  • 5
4
votes
2 answers

Removing non-ascii characters from any given stringtype in Python

>>> teststring = 'aõ' >>> type(teststring) >>> teststring 'a\xf5' >>> print teststring aõ >>> teststring.decode("ascii", "ignore") u'a' >>> teststring.decode("ascii", "ignore").encode("ascii") 'a' which is what i really wanted it to…
fullmooninu
  • 950
  • 3
  • 9
  • 26
4
votes
2 answers

C# - How to replace accented characters, i.e., "-É" with "- É"

I'm making a very simple Windows application using Visual Studio and C# that edits subtitles files for movies. I want a program that adds a space to dialog sentences when there isn't one. For example: -Hey, what's up? -Nothing much. to - Hey, what's…
Telmo F.
  • 167
  • 1
  • 4
  • 16
4
votes
2 answers

how to remove spurious non ascii characters, but keep spaces and newlines?

I have some text files that contain some non ASCII characters, I want to remove them, however keep the formatting characters. I tried $description = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $description); However that appeared to strip newlines…
kitenski
  • 639
  • 2
  • 16
  • 25
4
votes
1 answer

Unable to read non-ASCII content

I want to read non-ASCII JSON data, in my case it is in Perrsian, from a web page. Here is my code in python-2.7: jsonObject =…
anahita
  • 55
  • 4
4
votes
2 answers

How to ignore accents in a string so it does not alter its length?

I am determining the length of certain strings of characters in C++ with the function length(), but noticed something strange: say I define in the main function string str; str = "canción"; Then, when I calculate the length of str by str.length() I…
Carl Rojas
  • 149
  • 2
4
votes
0 answers

UITextChecker and non english words

I'm developing a custom keyboard for the iOS operating system and I'm trying to add the auto-suggestion feature. For English dictionary there seems to be no difficulties, but for languages like french I came into a problem regarding accents. See…
4
votes
1 answer

Python: ascii codec can't encode en-dash

I'm trying to print a poem from the Poetry Foundation's daily poem RSS feed with a thermal printer that supports an encoding of CP437. This means I need to translate some characters; in this case an en-dash to a hyphen. But python won't even encode…
4
votes
2 answers

JPA CriteriaQuery - accent insensitive

I am using JPA and PostgreSQL and I want to create a CriteriaQuery and create a query where the accents are not taken into consideration. Example: if I search the letter 'a', the database should return the values 'ã', 'a', 'á', etc. This should…
4
votes
1 answer

SQL Collate statement where clause

I am trying following query in SQL Server 2008 R2. While working with accent sensitivity I found this: select case when 'Наина' = '毛泽东先生' then 'Match' else 'No Match' end col I see result is: 'Match' What could be possibly be the reason for this…
4
votes
1 answer

Non-ASCII characters in R, reading from .sav file

I am trying to read a .sav file into RStudio. The file contains data from a Spanish language survey, and when I read it into R -- even though my default text encoding has already been set to ISO-8859-1 -- the display of special characters is…
Mabyn
  • 316
  • 2
  • 20