Questions tagged [unicode-literals]

Use this tag for questions related to Unicode Literals. An example: ( u'some text' ), which is a different type of an object from a byte string ( 'some text' ).

is used in its general meaning, so make sure you provide a tag of your programming environment, if any, in your question.

For example in Python, quoting this answer:

A unicode literal ( u'some text' ) is a different type of Python object from a python byte string ( 'some text' ). It's like using \n versus \N ; the former has meaning in python literals (it's interpreted as a newline character), the latter just means a backslash and a capital N (two characters).

92 questions
3
votes
4 answers

Why can u'\xe5' be decoded but not '\xe5'?

This is flabbergasting and extremely frustrating, please help. >>> a1 = '\xe5' # type >>> a2 = u'\xe5' # type >>> ord(a1) 229 >>> ord(a2) 229 >>> print a2.encode('utf-8') å >>> print a1.encode('utf-8') Traceback (most recent call…
Klas Lindberg
  • 61
  • 1
  • 4
3
votes
1 answer

Java decompiler gives strange symbols

I am using a Java decompiler and it seems to give a sensible code, except that it gives strange symbols for constant integers. For example: #int[] arr = new int['田']; This symbol has a numeric value in U+7530. I wonder if it works to revert this…
D.Badawi
  • 175
  • 1
  • 13
3
votes
2 answers

How do I specify a unicode literal that requires more than four hex digits in Antlr?

I want to define a lexer rule for ranges between unicode characters that have code points that need more than four hexadecimal digits to identify. To be concrete, I want to declare the following rule: ID_Continue : [\uE0100-\uE01EF]…
3
votes
1 answer

Unescaping unicode literals found in Haskell Strings

The unicode for lower case s is U+0073 , which this website says is \u0073 in C and Java. Given a file: a.txt containing: http://www.example.com/\u0073 Let's read this with Java, and unescape the \ and see what we get: import…
Rob Stewart
  • 1,812
  • 1
  • 12
  • 25
3
votes
3 answers

Print unicode literal string as Unicode character

I need to print a unicode literal string as an equivalent unicode character. System.out.println("\u00A5"); // prints ¥ System.out.println("\\u"+"00A5"); //prints \u0045 I need to print it as ¥ How can evaluate this string a unicode character ?
Pradeep
  • 97
  • 1
  • 8
3
votes
2 answers

Java: how to convert UTF-8 (in literal) to unicode

I've a UTF-8(in literal) like this "\xE2\x80\x93." I'm trying to convert this into Unicode using Java. But I was not able to find a way to convert this. Can anyone help me on this? Regards, Sat
Sat
  • 51
  • 2
  • 7
2
votes
0 answers

Jquery keypress event, event.keycode or event.which is always 0 in android

I am developing a transliterate application in which when an user typing, the character should be replaced with unicode character. For that I am getting the keycode value on kepress event and based on that I am setting unicode character. It is…
Regan
  • 21
  • 2
2
votes
1 answer

String differs after encoding and decoding

I stumbled across weird behaviour of encoding/decoding string. Have a look at an example: @Test public void testEncoding() { String str = "\uDD71"; // {56689} byte[] utf16 = str.getBytes(StandardCharsets.UTF_16); // {-2, -1, -1, -3} …
2
votes
1 answer

Python 2 and unicode_literals - UnicodeDecodeError: 'ascii' codec can't decode byte

Super duper Python newb here. Learning explicitly for network automation. One thing I've been trying to do is make code that works both in Python2 and Python3 but I've run into an issue that is probably obvious to most. And yes the title here is the…
fcs1000
  • 21
  • 1
2
votes
1 answer

How to use Unicode literals with the Node.js -e "evaluate script" commandline switch

Node.js has an -e commandline switch to evaluate code provided on the commandline rather than in a separate script file. Oddly, I can't find official documentation for it online but the node executable self-documents it if you run node --help: >node…
hippietrail
  • 15,848
  • 18
  • 99
  • 158
2
votes
1 answer

input() and literal unicode parsing

Using input() takes a backslash as a literal backslash so I am unable to parse a string input with unicode. What I mean: Pasting a string like "\uXXXX\uXXXX\uXXXX" into an input() call will become interpreted as "\\uXXXX\\uXXXX\\uXXXX" but I want it…
user1091684
  • 101
  • 1
  • 11
1
vote
1 answer

Regular expression and unicode literals

I'd like to remove some characters from a string (either byte string or unicode string) using a regular expression like this: pattern = re.compile(ur'\u00AE|\u2122', re.UNICODE) If the characters are specified as unicode literals the resulting…
Peter Prettenhofer
  • 1,951
  • 18
  • 23
1
vote
0 answers

Python encoding errors latin-1 PyPDF2

I am trying to extract the content of all the pdfs from my directory and print the text from all these pdfs as a txt file. I have managed to do so but issue occurs when I frequently have some pdfs with non latin letters. if someone could tell me how…
1
vote
1 answer

Displaying the unicode characters with real devices and emulators

Some unicode characers don't display on real devices and emulators but stay displaying within Android Studio design mode. For example, In design mode: but the emulator (as well as real device) shows nothing: So, why this is it? And, what should I…
Sergey V.
  • 981
  • 2
  • 12
  • 24
1
vote
2 answers

how to use non-english literal without getting any unicode error?

i have so many text label in my gui app that are not english. so I am getting unicode error. leaveBtn = Button(top_frame_label, text= u"G�rev Y�k�n� Ay�r".decode(errors='replace') , width = 15) …