0

I'm working with an api which returns some data in the form of 01234⇒56789. Sometimes this data has only numbers which is not a problem but sometimes it returns character. As I have to automate the filtering process of selecting the number after the arrow (non-ascii character) I have to know when the characters contains a non-ascii character.

I used decode(utf-8) and it returns u'01234\u21d256789' . I tried split('\u21d2') but the string is not splitting. Any help is appreciated.

Rahul
  • 3,208
  • 8
  • 38
  • 68

1 Answers1

1

python3:

>>> s = "01234⇒56789"
>>> s
'01234⇒56789'
>>> s.split("⇒")
['01234', '56789']

python2:

>>> s = u"01234⇒56789"
>>> s.split(u"⇒")
[u'01234', u'56789']

the key point in Python2 is to specify that you deal with an unicode string. In Python3 strings are unicode by default and there's bytes type

user3159253
  • 16,836
  • 3
  • 30
  • 56
  • 1
    Nice and concise answer. Only one minor point: if you don't specify the encoding at the top of your program, `s=u'01234⇒56789'` will give an 'Unsupported characters in input' error. Therefore, I would use `s.split(u'\u21d2')`instead. – maschu Dec 11 '14 at 10:13
  • Well, I'm using UTF-8 encoding in console by default, so it's not a problem for me and works as specified. But surely one might need the specify input encoding or use backslash-u coding explicitly. – user3159253 Dec 11 '14 at 10:21