Edit: Polars 0.13.42
and later
Polars now has a read_excel
function that will correctly handle this situation. read_excel
is now the preferred way to read Excel files into Polars.
Note: to use read_excel
, you will need to install xlsx2csv
(which can be installed with pip).
Polars: prior to 0.13.42
I can replicate this result. It is due to a column in the original Excel file that contains both text and numbers.
For example, create a new Excel file with one column in which you type both numbers and text, save it, and run your code on that file. I get the following traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/convert.py", line 299, in from_pandas
return DataFrame._from_pandas(df, rechunk=rechunk, nan_to_none=nan_to_none)
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 454, in _from_pandas
pandas_to_pydf(
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 485, in pandas_to_pydf
arrow_dict = {
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 486, in <dictcomp>
str(col): _pandas_series_to_arrow(
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 237, in _pandas_series_to_arrow
return pa.array(values, pa.large_utf8(), from_pandas=nan_to_none)
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object
There are several lengthy discussions on this issue, such as these:
This particular comment might be relevant, as you are concatenating the results of parsing multiple sheets in an Excel file. This may lead to conflicting dtypes for a column:
https://github.com/pandas-dev/pandas/issues/21228#issuecomment-419175116
How to approach this depends on your data and its use, so I can't recommend a blanket solution (i.e., fixing your source Excel file, or changing the dtype to str).