I tried to look at bunch of stackoverflow examples.
Python version used: Python 2.7.10
Output of the string s looked like
u'bh\xfcghi' where \xfc=ü
I am reading this from a webpage.
After I encode the string via .encode('utf-8'), it looks like
'bh\xc3\xbcghi' where \xc3\xbc=ü
Expected Output should be:
bhüghi
I even tried to decode/encode(latin-1), decode(utf-8).
After nfn neil comment I tried the following again:
elem.text output:
('elem text:', u'bh\xfcghi\nMCI\n8 90 1 0 0 2 0 0 0 0 0 0 2 26 41.4 18.5 89 14.9')
elem text type:
('elem text type:', <type 'unicode'>)
Now, I am trying to print it:
splitString = elem.text.encode('utf-8').decode("utf-8").split()
print("splitString: ", splitString[0])
SplitString[0] output:
u'bh\xfcghi'
Now if I print the whole string after split:
print("splitString: ", splitString)
SplitString output:
[u'bh\xfcghi', u'MCI', u'8', u'90', u'1', u'0', u'0', u'2', u'0', u'0', u'0', u'0', u'0', u'0', u'2', u'26', u'41.4', u'18.5', u'89', u'14.9']
Full code is in pastebin: Here's A link
Any help will be appreciated.