python: extended ASCII codes

Question

Hi I want to know how I can append and then print extended ASCII codes in python. I have the following.

code = chr(247)

li = []
li.append(code)
print li

The result python print out is ['\xf7'] when it should be a division symbol. If I simple print code directly "print code" then I get the division symbol but not if I append it to a list. What am I doing wrong?

Thanks.

Extended ASCII is not well-defined. There are many. Why don't you use Unicode? — David Heffernan, Jan 21 '14 at 09:24
There is no such thing as "extended ASCII", there are many different encodings in which 247 can mean different things. You need to decode the string with the right encoding. — RemcoGerlich, Jan 21 '14 at 09:24
The clarification that the most common extension is Latin-1 is very helpful. — bballdave025, Jul 26 '18 at 16:26

score 11 · Accepted Answer · edited Jan 21 '14 at 09:51

11

When you print a list, it outputs the default representation of all its elements - ie by calling repr() on each of them. The repr() of a string is its escaped code, by design. If you want to output all the elements of the list properly you should convert it to a string, eg via ', '.join(li).

Note that as those in the comments have stated, there isn't really any such thing as "extended ASCII", there are just various different encodings.

edited Jan 21 '14 at 09:51

jsbueno

99,910
10
151
209

answered Jan 21 '14 at 09:32

Daniel Roseman

588,541
66
880
895

1

If there is no such thing as extended Ascii, what does the table at http://www.ascii-code.com/ represent? – Paul de Barros Feb 24 '17 at 13:51
5

Extended Ascii is an out dated term for the characters above 128, the upper half of the code set range. So to say this does not exist is a bit inaccurate. This is old dos days stuff. Now there are multiple characters sets that have 256 characters generally or more if you use the newer characters sets. The characters have changed over time too. So if you look at an early ascii chart and a modern one there are multiple differences. – M T Head Jul 31 '17 at 23:52
5

@MTHead It's true that "extended ascii" refers to the upper 128 characters (and anyone saying "it doesn't exist" is confusing and misleading people) -- however to be even **more** precise, there's no single standard for what the upper 128 characters will be. MS-DOS had over 200 code pages for possible extensions to ASCII. The most popular extension to ASCII is (was?) Latin-1, which contained accented characters and the like to support almost all Western European languages. Saying "there's no such thing as extended ascii" is wrong, but saying "extended ascii is exactly this" is likewise wrong. – stevendesu Mar 08 '18 at 16:25
1

@PauldeBarros, as stated on the site, that table represents the Windows-1252 (or code page 1252) character set. This is a superset of ASCII, which is probably why they call it "extended ASCII". IMHO this is an incorrect (or at least confusing) term, since the ASCII standard itself is not extended. Instead, there are plenty of new character sets based on ASCII (by extending it and thus creating new unique character sets). So, on that site you should read "extended ASCII" as "one of the most popular extensions of ASCII", which is a more accurate description. – wovano Sep 07 '21 at 09:59

score 10 · Answer 2 · answered Feb 12 '16 at 00:39

You probably want the charmap encoding, which lets you turn unicode into bytes without 'magic' conversions.

s='\xf7'
b=s.encode('charmap')
with open('/dev/stdout','wb') as f:
    f.write(b)
    f.flush()

Will print ÷ on my system.

Note that 'extended ASCII' refers to any of a number of proprietary extensions to ASCII, none of which were ever officially adopted and all of which are incompatible with each other. As a result, the symbol output by that code will vary based on the controlling terminal's choice of how to interpret it.

score 8 · Answer 3 · edited Nov 23 '21 at 13:03

There's no single defined standard named "extend ASCII Codes"> - there are however, plenty of characters, tens of thousands, as defined in the Unicode standards.

You can be limited to the charset encoding of your text terminal, which you may think of as "Extend ASCII", but which might be "latin-1", for example (if you are on a Unix system such as Linux or Mac OS X, your text terminal will likely use UTF-8 encoding, and able to display any of the tens of thousands chars available in Unicode)

So, you must read this piece in order to understand what text is, after 1992 - If you try to do any production application believing in "extended ASCII" you are harming yourself, your users and the whole eco-system at once: http://www.joelonsoftware.com/articles/Unicode.html

That said, Python2's (and Python3's) print will call the an implicit str conversion for the objects passed in. If you use a list, this conversion does not recursively calls str for each list element, instead, it uses the element's repr, which displays non ASCII characters as their numeric representation or other unsuitable notations.

You can simply join your desired characters in a unicode string, for example, and then print them normally, using the terminal encoding:

import sys

mytext = u""
mytext += unichr(247) #check the codes for unicode chars here:  http://en.wikipedia.org/wiki/List_of_Unicode_characters

print mytext.encode(sys.stdout.encoding, errors="replace")

First there was Ascii characters 0-127, then there was Extended Ascii, 128-256. Then there was Unicode that goes beyond 256 characters in a character set. It is not correct to say there is no Extended Ascii. — M T Head, Aug 01 '17 at 00:28
It was not "then there was 'extend ascii'. It is 'then,there was hundreds of vendors worldwide each pushing whatever they wanted on the 128-255 code space". — jsbueno, Sep 06 '21 at 14:05
https://en.wikipedia.org/wiki/Extended_ASCII - "...Using the term "extended ASCII" on its own is sometimes criticized,[1][2][3] because it can be mistakenly interpreted to mean that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, neither of which is the case." — jsbueno, Sep 06 '21 at 14:07
I've suggested an edit to the opening paragraph. Claiming there's "no such thing" is incorrect when the problem really is there are "many such things". Anyone who grew up with 80s, 90s, and even early 00s computing is likely to know the term however misleading it may be. — Philip Couling, Nov 23 '21 at 13:04

score 1 · Answer 4 · answered Jan 21 '14 at 09:49

You are doing nothing wrong.

What you do is to add a string of length 1 to a list.

This string contains a character outside the range of printable characters, and outside of ASCII (which is only 7 bit). That's why its representation looks like '\xf7'.

If you print it, it will be transformed as good as the system can.

In Python 2, the byte will be just printed. The resulting output may be the division symbol, or any other thing, according to what your system's encoding is.

In Python 3, it is a unicode character and will be processed according to how stdout is set up. Normally, this indeed should be the division symbol.

In a representation of a list, the __repr__() of the string is called, leading to what you see.

python: extended ASCII codes

4 Answers4

Linked