How to know if a data contains non-ascii character?

Question

I'm working with an api which returns some data in the form of 01234⇒56789. Sometimes this data has only numbers which is not a problem but sometimes it returns ⇒ character. As I have to automate the filtering process of selecting the number after the arrow (non-ascii character) I have to know when the characters contains a non-ascii character.

I used decode(utf-8) and it returns u'01234\u21d256789' . I tried split('\u21d2') but the string is not splitting. Any help is appreciated.

user3159253 · Accepted Answer · 2014-12-11T09:57:06.353

1

python3:

>>> s = "01234⇒56789"
>>> s
'01234⇒56789'
>>> s.split("⇒")
['01234', '56789']

python2:

>>> s = u"01234⇒56789"
>>> s.split(u"⇒")
[u'01234', u'56789']

the key point in Python2 is to specify that you deal with an unicode string. In Python3 strings are unicode by default and there's bytes type

edited Dec 11 '14 at 09:57

answered Dec 11 '14 at 09:47

user3159253

16,836
3
30
56

1

Nice and concise answer. Only one minor point: if you don't specify the encoding at the top of your program, `s=u'01234⇒56789'` will give an 'Unsupported characters in input' error. Therefore, I would use `s.split(u'\u21d2')`instead. – maschu Dec 11 '14 at 10:13
Well, I'm using UTF-8 encoding in console by default, so it's not a problem for me and works as specified. But surely one might need the specify input encoding or use backslash-u coding explicitly. – user3159253 Dec 11 '14 at 10:21

How to know if a data contains non-ascii character?

1 Answers1