0

I have a function that's reading a content object into a pandas dataframe.

import pandas as pd
from cStringIO import StringIO, InputType

def create_df(content):
    assert content, "No content was provided, can't create dataframe"

    if not isinstance(content, InputType):
        content = StringIO(content)
    content.seek(0)
    return pd.read_csv(content)

However I keep getting the error TypeError: StringIO() argument 1 must be string or buffer, not cStringIO.StringIO

I checked the incoming type of the content prior to the StringIO() conversion inside the function and it's of type str. Without the conversion I get an error that the str object does not have a seek function. Any idea whats wrong here?

staten12
  • 735
  • 3
  • 9
  • 20
  • 1
    `InputType` is one of two types defined in `cStringIO`. Presumably you have a `OutputType` instance instead. – Martijn Pieters Sep 12 '18 at 13:48
  • Strange, I changed it to OutputType and it worked. But the same function was working with InputType before until I changed something in the code that I can't figure out what it was. Thanks – staten12 Sep 12 '18 at 13:53
  • That's because `StringIO('content')` creates `InputType` instances, while `StringIO()` (no argument) creates `OutputType` instances. You need to test *for both kinds*. – Martijn Pieters Sep 12 '18 at 13:55

1 Answers1

1

You only tested for InputType, which is a cStringIO.StringIO() instance that supports reading. You appear to have the other type, OutputType, the instance created for an instance that supports writing to:

>>> import cStringIO
>>> finput = cStringIO.StringIO('Hello world!')  # the input type, it has data ready to read
>>> finput
<cStringIO.StringI object at 0x1034397a0>
>>> isinstance(finput, cStringIO.InputType)
True
>>> foutput = cStringIO.StringIO()  # the output type, it is ready to receive data
>>> foutput
<cStringIO.StringO object at 0x102fb99d0>
>>> isinstance(foutput, cStringIO.OutputType)
True

You'd need to test for both types, just use a tuple of the two types as the second argument to isinstance():

from cStringIO import StringIO, InputType, OutputType

if not isinstance(content, (InputType, OutputType)):
    content = StringIO(content)

or, and this is the better option, test for read and seek attributes, so you can also support regular files:

if not (hasattr(content, 'read') and hasattr(content, 'seek')):
    # if not a file object, assume it is a string and wrap it in an in-memory file.
    content = StringIO(content)

or you could just test for strings and [buffers](https://docs.python.org/2/library/functions.html#buffer(, since those are the only two types that StringIO() can support:

if isinstance(content, (str, buffer)):
    # wrap strings into an in-memory file
    content = StringIO(content)

This has the added bonus that any other file object in the Python library, including compressed files and tempfile.SpooledTemporaryFile() and io.BytesIO() will also be accepted and work.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343