1

I'd like to read a string buffer into a pandas DataFrame. It seems that a good way to do it would be to use pandas' ExcelFile functionality. I've tried to do something like the following:

from pandas import ExcelFile as excel_handler
excel_data = excel_handler(StringIO(file_stream.read()).getvalue())

From then on, I guess ExcelFile.parse() can be used.

This produces the following error:

<class 'openpyxl.shared.exc.InvalidFileException'> [Errno 2] No such
file or directory: '

Any ideas on how to read in the file from the buffer?

Lamps1829
  • 2,231
  • 3
  • 24
  • 32
  • What is file_stream? More importantly, why aren't you just using this as `ExcelFile('path/of/excel/file')` or `ExcelFile(file)`... – Andy Hayden Jul 23 '13 at 19:05
  • It's within the web2py framework. Within the context, it's easier to create the buffer than to specify the filepath. I verified that the StringIO(...) buffer works as expected (i.e. had no problem using it with the xlrd module to open the excel workbook). – Lamps1829 Jul 23 '13 at 19:15
  • Can you not just pass the buffer? – Andy Hayden Jul 23 '13 at 19:18
  • Isn't that what I'm doing in the code above? Or do you mean to remove getvalue()? That produces an error: coercing to Unicode: need string or buffer, _io.BytesIO found – Lamps1829 Jul 23 '13 at 19:21
  • what about `ExcelFile(file_stream)` ? – Andy Hayden Jul 23 '13 at 19:44
  • It'll produce the error that it's expecting a filepath. It's kind of hard to believe that there's no way to do it with a buffer... – Lamps1829 Jul 23 '13 at 19:53
  • Docstring says if takes a file too, which version of pandas are you using? – Andy Hayden Jul 23 '13 at 20:05
  • Using pandas version 0.11.0 – Lamps1829 Jul 23 '13 at 21:51
  • Thanks - you were right that getvalue() doesn't need to be there. The problem was being caused by the presence of filestream.read() earlier in the code. – Lamps1829 Jul 24 '13 at 10:44

1 Answers1

1

Fixed. Had missed a part earlier in my code where file_stream.read() was being called. Consequently, by the time ExcelFile was being called, an empty string was being passed to it, causing an error. getvalue() needed to be removed. Here's how it should go:

from pandas import ExcelFile
excel_data = ExcelFile(StringIO(file_stream.read())
dataframe = excel_data.parse(excel_data.sheet_names[-1])
Lamps1829
  • 2,231
  • 3
  • 24
  • 32