Inserting a unicode character using .join()

Question

I have columns in a data table that I need to join. One column consists of values and the other of corresponding error values, for example:

50.21  0.03
43.23  0.06
23.65  1.20
12.22  0.06
11.25  2.21

What I'd like to do is, for each row join the columns along with a +/-, but the clean unicode character (U+00B1). I've never tried to use unicode characters in python before, so I'm sorta stumped.

If my .join() looks like

"<unicode here>".join(item)

how exactly do I let python know I want to use a unicode character.

As a side note, if you want to start learning Unicode in Python now, you should consider switching to Python 3.x first. Learning Unicode in 3.x is a lot easier (and a lot different, so half of what you learn for 2.x today won't even be right in 3.x). — abarnert, Dec 19 '13 at 19:41
Yeah, I've been thinking of making the switch. This is one more reason to do so I guess. Thanks for the suggestion. — Matt, Dec 19 '13 at 19:44
While you're at it, reading the Unicode HOWTO for [2.7](http://docs.python.org/2/howto/unicode.html) and for [3.x](http://docs.python.org/3/howto/unicode.html) is probably worth doing. Andrew Kuchling is good at explaining things, and there are nice links to other resources as well. — abarnert, Dec 19 '13 at 19:48

score 6 · Accepted Answer · answered Dec 19 '13 at 19:35

6

If you want to join with unicode, use a unicode string:

u'\u00b1'.join(item)

This does presume that item is a sequence of strings; byte strings or unicode strings. Byte strings will be coerced to unicode for you, with the ASCII codec.

It'd be better to explicitly turn your values into unicode strings, that way you can control what encoding is used.

Demo with str values:

>>> items = [r.split() for r in '''\
... 50.21  0.03
... 43.23  0.06
... 23.65  1.20
... 12.22  0.06
... 11.25  2.21
... '''.splitlines()]
>>> items
[['50.21', '0.03'], ['43.23', '0.06'], ['23.65', '1.20'], ['12.22', '0.06'], ['11.25', '2.21']]
>>> for item in items:
...     print u'\u00b1'.join(item)
... 
50.21±0.03
43.23±0.06
23.65±1.20
12.22±0.06
11.25±2.21

answered Dec 19 '13 at 19:35

Martijn Pieters

1,048,767
296
4,058
3,343

Awesome. My pre-unicode script is very similar to yours, however when I try to `print` the `u\u00b1'` I get this: `UnicodeEncodeError: 'ascii' codec can't encode character u'\xb1' in position 5: ordinal not in range(128)`. Is this due to me not explicitly turning my values into unicode strings? – Matt Dec 19 '13 at 19:42
@Matt: _Printing_ unicode strings is a whole other problem on top of creating them. (Especially if you're on Windows.) The problem here is that you've created a valid Unicode string, and Python is trying to encode it appropriately for your console—but it can't figure out what character set your console wants, so it's falling back to `'ascii'`. And there is no ASCII character for `±`. You should probably accept this answer and create a new question (or, better, searching for similar questions, because there are surely multiple dups here). – abarnert Dec 19 '13 at 19:45
@Matt: But first, to make absolutely sure that everything other than printing works, you may want to try (a) running the script in IDLE (which should be able to handle Unicode output) and/or (b) explicitly encoding to UTF-8 and writing the result to a file (or using `io.open` or `codecs.open` to create a UTF-8 file and writing the `unicode` to that) and verifying that the file looks right when viewed as a UTF-8 text file. – abarnert Dec 19 '13 at 19:47
@Matt: Then you are using a terminal or console that doesn't support printing that character. – Martijn Pieters Dec 19 '13 at 19:47
@MartijnPieters Yeah, I'm doing this through the `shell` in Emacs. That might be why. I have a lot to look into for now. Thanks for the help. – Matt Dec 19 '13 at 19:48
@MartijnPieters: Actually, I'll bet he's using a terminal that _does_ support it, but something is preventing Python from guessing the encoding. – abarnert Dec 19 '13 at 19:49
@abarnert: Also an option. An Emacs shell could do that. – Martijn Pieters Dec 19 '13 at 19:49
2

@Matt: Related: [Make Emacs use UTF-8 with Python Interactive Mode](http://stackoverflow.com/q/888406) – Martijn Pieters Dec 19 '13 at 19:50
1

@Matt: Aha, that's exactly what it is. `python-mode`'s Python shell can interact with MULE, but run `python` manually inside `shell`'s shell can't. You can usually work around that by `export PYTHONIOENCODING=UTF-8` and _also_ setting the appropriate locale variables for your platform (`LC_ALL` or `LANG` should be enough) to something UTF-8-friendly, before launching Python. (Assuming you're on a UTF-8 underlying terminal, or in an X window and have MULE configured correctly.) – abarnert Dec 19 '13 at 19:51
@MartijnPieters THANKS! As a bit more info I'm doing this, aside from within Emacs, within Mac OS. In case that's worth anything. – Matt Dec 19 '13 at 19:52
@Matt: Well, that guarantees that you have a UTF-8 terminal and all of your locales are UTF-8-friendly, because OS X doesn't support anything else… So you may just need the `PYTHONIOENCODING` bit. (I'm not actually sure; I almost always use Aquamacs for my emacs, and I also almost always use python-mode's PyShell rather than running python in a shell.) As a side note, any reason not to use either Aquamacs or the Aqua port of plain emacs? – abarnert Dec 19 '13 at 19:54
Good to know. I just ran the script in Mac's Terminal and it printed the character without issue. So, when I'm free I guess I'll figure out how to run this correctly in Emacs. Thanks for the help. – Matt Dec 19 '13 at 19:55

Inserting a unicode character using .join()

1 Answers1