Questions tagged [unicode-literals]

Use this tag for questions related to Unicode Literals. An example: ( u'some text' ), which is a different type of an object from a byte string ( 'some text' ).

is used in its general meaning, so make sure you provide a tag of your programming environment, if any, in your question.

For example in Python, quoting this answer:

A unicode literal ( u'some text' ) is a different type of Python object from a python byte string ( 'some text' ). It's like using \n versus \N ; the former has meaning in python literals (it's interpreted as a newline character), the latter just means a backslash and a capital N (two characters).

92 questions
7
votes
2 answers

Python subprocess echo a unicode literal

I'm aware that questions like this have been asked before. But I'm not finding a solution. I want to use a unicode literal, defined in my python file, with the subprocess module. But I'm not getting the results that I need. For example the following…
Shane Gannon
  • 6,770
  • 7
  • 41
  • 64
7
votes
2 answers

Getting empty character literal error in java code that specified unicode literals

Why does this code public class Apostrophier { public static String replace(String s) { return s.replace('\u0092','\u0027'); } } give 'empty character literal' when I try to compile ?
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
6
votes
5 answers

Unicode (hexadecimal) character literals in MySQL

Is there a way to specify Unicode character literals in MySQL? I want to replace a Unicode character with an Ascii character, something like the following: Update MyTbl Set MyFld = Replace(MyFld, "ẏ", "y") But I'm using even more obscure characters…
ChrisV
  • 8,748
  • 3
  • 48
  • 38
6
votes
1 answer

VS2013 and Unicode literals give warnings

What is wrong with this code: static const std::vector glyphs( {L'A', L'B', L'C', L'D', L'E', L'F', L'G', L'H', L'I', L'J', L'K', L'L', L'M', L'N', L'O', L'P', L'Q', L'R', L'S', L'T', L'U', L'V', L'W', L'X', L'Y', L'Z',…
juzzlin
  • 45,029
  • 5
  • 38
  • 50
6
votes
3 answers

Python unicode string literals in module declared as utf-8

I have a dummie Python module with the utf-8 header that looks like this: # -*- coding: utf-8 -*- a = "á" print type(a), a Which prints: á But I thought that all string literals inside a Python module declared as utf-8 whould…
Caumons
  • 9,341
  • 14
  • 68
  • 82
5
votes
2 answers

How do you safely declare a 16-bit string literal in C?

I'm aware that there is already a standard method by prefixing with L: wchar_t *test_literal = L"Test"; The problem is that wchar_t is not guaranteed to be 16-bits, but for my project, I need a 16-bit wchar_t. I'd also like to avoid the requirement…
user6754053
5
votes
1 answer

How to properly use `__attribute__((format (printf, x, y)))` for C11 U"unicode literals"?

I'm porting an application from using char* for everything and everywhere to using UCS4 as it's internal Unicode representation. I use C11's U"unicode literals" for defining strings, which expand to arrays of char32_t, which are uint32_t…
toriningen
  • 7,196
  • 3
  • 46
  • 68
4
votes
3 answers

Unicode literals causing invalid syntax

The following code: s = s.replace(u"&", u"&") is causing an error in python: SyntaxError: invalid syntax removing the u's before the " fixes the problem, but this should work as is? I'm using Python 3.1
SupaGu
  • 579
  • 6
  • 18
4
votes
2 answers

Regex matching Unicode variable names

In Python 2, a Python variable name contains only ASCII letters, numbers and underscores, and it must not start with a number. Thus, re.search(r'[_a-zA-Z][_a-zA-Z0-9]*', s) will find a matching Python name in the str s. In Python 3, the letters…
jmd_dk
  • 12,125
  • 9
  • 63
  • 94
4
votes
2 answers

unicode_literals and doctest in Python 2.7 AND Python 3.5

Consider the following demo script: # -*- coding: utf-8 -*- from __future__ import division from __future__ import unicode_literals def myDivi(): """ This is a small demo that just returns the output of a divison. >>> myDivi() 0.5 …
matth
  • 2,568
  • 2
  • 21
  • 41
4
votes
1 answer

Unicode code point escapes in regex literals - Javascript

Can this regex literal syntax having Unicode escape sequence syntax, var regpat= /^[\u0041-\u005A\u0061-\u007A\.\' \-]{2,15}/; be written using Unicode code point escape syntax(as shown below)? var regpat= /^[\u{41}-\u{5A}\u{61}-\u{7A}\u{1F4A9}\.\'…
overexchange
  • 15,768
  • 30
  • 152
  • 347
4
votes
2 answers

How to ensure all string literals are unicode in python

I have a fairly large python code base to go through. It's got an issue where some string literals are strings and others are unicode. And this causes bugs. I am trying to convert everything to unicode. I was wondering if there is a tool that can…
mmopy
  • 695
  • 7
  • 15
4
votes
1 answer

Getting no output to win console when raising exception containing unicode literal (u"\u0410")

I came across obscure problem when raised Python exception got printed to win console. When exception message contains any unicode literal it is not printed at all or is printed improperly. Console encoding is cp866 When python default encoding is…
Unicorn
  • 1,397
  • 1
  • 15
  • 24
3
votes
2 answers

Python 2to3 - do not remove unicode prefixes

I am converting a legacy codebase to python3 and do some dry runs of 2to3. 2to3 removes the u'' prefix from unicode literals creating a lot of noise in the diffs. Is there a way to disable this (as u'my string' is valid py3 syntax)?
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
3
votes
1 answer

Python: How to specify and view high-numbered Unicode characters?

The Unicode character U+1d134 is the musical symbol for "common time"; it looks like a capital 'C'. But using Python 3.6, when I specify '\U0001d134' I get a glyph that seems to indicate an unknown symbol. On my Mac, it looks like a square with a…
wchlm
  • 365
  • 1
  • 11