0

I'm a newbie trying #pythonchallenge, with some help! I'm in Challenge 8 and a simple command such as:

import bz2
bz2.decompress('BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084')

Won't work in Python 3X (but it does in Python 2x)

A number of times I had to change from bytes.decode into str and vice-versa, but I'm just at a lost when to change and why.

The other example was in Challenge 6

 comments = comments + str(bytes.decode((zip_try.getinfo(f_name).comment)))

I keep receiving the message TypeError: 'str' does not support the buffer interface

Any help?

I tried different pages to indicate portability from Python 2X to 3X and they say: 'strings are Unicode by default '

What does it mean? That I actually would not have to inform

 bytes('my stuff', 'utf-8')

, right?

Thanks, sorry if it sounds dumb!

B Furtado
  • 1,488
  • 3
  • 20
  • 34

2 Answers2

3

Regarding the problems with the code you posted, the first snippet has to be modified to work with Python 3 as follows:

import bz2
bz2.decompress(b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084')

The b prefix marks the string literal as a string of bytes rather than the default of Unicode string.

In the second case, ZipFile.getinfo().comment is of bytes type, so you will need to do

comments += zip_try.getinfo(f_name).comment.decode()

assuming that comments is of str type.

As for the Python 2 and 3 text handling in general, it is one of the key differences between them. When starting out, I recommend to start by studying the official Python 3 Unicode guide first to understand the concepts and to learn the sane way of dealing with strings, bytes and encoding and then read the Python 2 version of the same guide to understand its specific quirks.

Martin Valgur
  • 5,793
  • 1
  • 33
  • 45
1

In python 3 you can think of bytes as something similar to python2's str, and str is python2's unicode. In python 3 the default literal is str, if you want to specify a bytes literal, you add a b before the literal. This is what python is asking for in the first case:

TypeError: a bytes-like object is required, not 'str'

So it would be:

import bz2
bz2.decompress(b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084')

The second case, I don't know without seeing more code, but remember you can convert from bytes to str and back with the encode and decode methods, similarly to python 2 with str and unicode strings. For example, the following line:

'á'.encode("utf8").decode("utf8")

would convert the 'á' str to a bytes encoded in utf8 and back to an str again.

dyeray
  • 1,056
  • 6
  • 17
  • Thanks. The b' solved the problem, but the answer was still with a preceding b'huge – B Furtado Dec 15 '15 at 22:20
  • The explanation, though, I am not sure I follow. Bytes are similar to str in Python 3 and strings are now Unicode for default. So if the variable looks like bytes, I add b'... – B Furtado Dec 15 '15 at 22:23
  • What about 'str does not suport buffer interface'. It means I would have to ''.encode? Thanks, anyway. – B Furtado Dec 15 '15 at 22:24
  • Why you add b: because this function (bz2.decompress) must receive a bytes-like object. So, it's not about the variable. The other case, as I said, without more code (seeing what type is each variable) I couldn't say, but if the case is similar (you are passing an str where you should pass a bytes) you can convert it using encode, yes. – dyeray Dec 15 '15 at 22:32
  • Ah. Ok. So depending on the function the argument may ONLY be bytes like. Ok. Thanks I'll read https://docs.python.org/3/howto/unicode.html – B Furtado Dec 15 '15 at 22:33
  • but I thought something in the format 'BZh91AY&SYA\xaf\x82\r\x00\x00\... was 'bytes' format... – B Furtado Dec 15 '15 at 22:34
  • A bytes object can only have ascii characters (0-255), that is why it is called bytes (each character is one byte), so no symbols like 'à'. – dyeray Dec 15 '15 at 22:42