Questions tagged [non-ascii-characters]

ASCII stands for 'American Standard Code for Information Interchange'. ASCII is a character-encoding scheme based on the ordering of the English alphabet. Since ASCII only contains definitions for 128 characters, numerous other encoding schemes have been created to include characters from other alphabets and other symbols.

1055 questions
4
votes
2 answers

JQuery filterable plugin with i18n (collation) support

I have searched without success a JQuery plugin able to filter an HTML list (list of li, div or others) based on its content. I found numerous ones, but none of them seems to support what's called collation in MySQL (and certainly in other…
4
votes
3 answers

Diatric string conversion in Oracle – TO_ASCII alternative in Oracle - How to remove accents and special characters

I need to convert strings with diatrics to as ASCII version of it. As an example the string “Caicó” is converted to “Caico” and “À bientôt” is converted to “A bientot”. It’s a usual problem with lots of European languages that uses diactrics to…
Rafael Borja
  • 4,487
  • 7
  • 29
  • 33
4
votes
3 answers

PERL to count non-printable characters

I have 100,000's of files that I would like to analyze. Specifically I would like to calculate the percentage of printable characters from a sample of the file of arbitrary size. Some of these files are from mainframes, Windows, Unix, etc. so it…
Stan
  • 905
  • 9
  • 20
4
votes
1 answer

Accent characters are crashing stand alone PHP interpreter

I have a PHP script that is run via command line & stand-alone PHP interpreter. The script outputs a lot of html into a CSV file, which is later bulk-uploaded to a WP site. Accent characters seem to crash the interpreter. ex: $output = "Não…
4
votes
3 answers

Creating an effective word counter including Chinese/Japanese and other accented languages

After trying to figure how to have an effective word counter of a string, I know about the existing function that PHP has str_word_count but unfortunately it doesn't do what I need it to do because I will need to count the number of words that…
MacMac
  • 34,294
  • 55
  • 151
  • 222
4
votes
2 answers

Characters with ASCII > 128 are not correctly read in Javascript

I have a HTML that includes a Javascript file. This script contains a special character, ASCII 152. When I try to display the charCodeAt, I get different results, but never the right one. Could you please advice? Thanks TEST.HTML
3
votes
2 answers

Read Chinese characters from Excel worksheet? (Always returns "????")

How do I read Chinese characters from Excel cells and write them to a file? When I take values by Worksheets(ActiveCell.Worksheet.Name).Cells(3, columnNumbers(0)).value it always returns "????????"
skmaran.nr.iras
  • 8,152
  • 28
  • 81
  • 116
3
votes
2 answers

How to Decode Scrambled Character Encoding: Special Character Encoding

I have data in CSV format that has been seriously scrambled character encoding wise, likely going back and forth between different software applications (LibreOffice Calc, Microsoft, Excel, Google Refine, custom PHP/MySQL software; on Windows XP,…
balleyne
  • 318
  • 1
  • 3
  • 8
3
votes
1 answer

Why does std::iswalpha return false for some French characters in C++?

I am using std::iswalpha to check if a non-ASCII character (French character) is alphabetic. However, I have found that it returns false for é character. I have set my locale to fr_FR.UTF-8 in my code. Can you help me understand why this is…
Abdo21
  • 498
  • 4
  • 14
3
votes
2 answers

Is there a specific name for hidden whitespace symbols (\n, \t, etc.)?

I can find lots of answers about what the differences are between these, or what they are functionally, but I want to know if there's a name for characters like these that typically don't show up as a character so much as they change the structure…
3
votes
1 answer

How can I translate non printable ascii chars to readable text with Perl

I'm trying to test some probes connected via USB on an Linux device using Perl 5.28 and Linux (Debian 8). When I read out a large file buffer of the probe, often none readable ASCII signs occur like \0 or \x02. I want to translate these signs into…
huckfinn
  • 644
  • 6
  • 23
3
votes
1 answer

What is the HTML code for a double forward slash symbol //?

I'm looking for a symbol that looks like two forward slashes in succession. Of course I can just enter two slashes like this: // but I would prefer a single symbol with these slashes very close together. I tried to google it, to no avail.
3
votes
1 answer

Usage of unicode characters in URL

the base of this question comes from the fact that in many latin languages, and also in many non-latin languages there are letters that from what I've seen, up until recently were not really usable in URLs and nearly always ended up generating a big…
Mihail Minkov
  • 2,463
  • 2
  • 24
  • 41
3
votes
1 answer

Cucumber - Java - Non-ASCII characters in an identifier

Hi community: I'm working with Cucumber in another language different than English. When I generate the Step Definitions it displays a message over the void method Non-ASCII characters in an identifier This is the Step Definition sample. @Y("^se…
nosequeweaponer
  • 511
  • 10
  • 38
3
votes
2 answers

Filter pandas dataframe rows containing non-ascii values

I have a column with addresses and want to find all rows that contain 'foreign' i.e. non-ASCII characters. import pandas as pd df = pd.DataFrame.from_dict({ 'column_name': ["GREENLAND HOTEL, CENTRAL AVENUE, NAGPUR-440 018.", "Møllegade 1234567…
Pranab
  • 2,207
  • 5
  • 30
  • 50