0

Description of problem

I have to migrate some code to Python 3. The compilation terminated with success. But I have a problem on the runtime:

static PyObject* Parser_read(PyObject * const self, PyObject * unused0, PyObject * unused1) {
    //Retrieve bytes from the underlying data stream.
    //In this case, an iterator
    PyObject * const i = PyIter_Next(self->readIterator);

    //If the iterator returns NULL, then no more data is available.
    if(i == NULL)
    {
        Py_RETURN_NONE;
    }

    //Treat the returned object as just bytes
    PyObject * const bytes = PyObject_Bytes(i);

    Py_DECREF(i);

    if( not bytes )
    {
        //fprintf(stderr, "try to read %s\n", PyObject_Str(bytes));
        PyErr_SetString(PyExc_ValueError, "iterable must return bytes like objects");
        return NULL;

    }

    ....
}

In my python code, I have something like that:

for data in Parser(open("file.txt")):
   ...

The code works well on Python 2. But on Python 3, I got:

ValueError: iterable must return bytes like objects

Update

The solution of @casevh works well in all test cases except one: when I wrap the stream:

def wrapper(stream):
    for data in stream:
        for i in data:
            yield i

for data in Parser(wrapper(open("file.txt", "rb"))):
    ...

and I got: ValueError: iterable must return bytes like objects

Wael Ben Zid El Guebsi
  • 2,670
  • 1
  • 17
  • 15

2 Answers2

3

One option is to open the file in binary mode:

open("file.txt", "rb")

That should create an iterator that returns a sequence of bytes.

Python 3 strings are assumed to be Unicode and without proper encoding/decoding, they shouldn't be interpreted as a sequence of bytes. If you are reading plain ASCII text, and not a binary data stream, you could also convert from Unicode to ASCII. See PyUnicode_AsASCIIString() and related functions.

casevh
  • 11,093
  • 1
  • 24
  • 35
  • thanks, It works. But, In some test, I have a wrapper to the stream. so I do something like that: for i in open("someFile", "rb"): for j in i: yield j And I got the same error – Wael Ben Zid El Guebsi Mar 10 '15 at 15:09
  • Does your wrapper return `str` or `bytes`? If your data is ASCII text, you may want to use the second option, i.e. convert Unicode to ASCII. – casevh Mar 10 '15 at 15:20
0

As noted by @casevh, in Python you need to decide whether your data is binary or text. The fact that you are iterating lines makes me think that the latter is the case.


def wrapper(stream):
    for data in stream:
        for i in data:
            yield i

works in Python 2, because iterating a str will yield 1-character strings; in Python 3, iterating over a bytes object will yield individual bytes that are integers in range 0 - 255. You can get the the code work identically in Python 2 and 3 (and identically to the Python 2 behaviour of the code above) by using range and slicing 1 byte/character at a time:

def wrapper(stream):
    for data in stream:
        for i in range(len(data)):
            yield data[i:i + 1]

P.S. You also have a mistake in your C extension code: Parser_read takes 3 arguments, 2 of which are named unused_x. Only a method annotated with METH_KEYWORDS takes 3 arguments (PyCFunctionWithKeywords); all others, including METH_NOARGS must be functions taking 2 arguments (PyCFunction).