1

Im testing how pandas works saving excel files and came along and error when i tried making it read and save a big excel file.

So i tried it with a small sample from my big excel file to test, around 150 rows, and it works. So i used the full sheet, around 200,000 rows, and got a couple errors

This is what Im using to test saving

import pandas as pd

xls = pd.ExcelFile('File\\Path\\File.xlsx')
df = pd.read_excel(xls, 'Sheet_Name')

df.to_excel('TEST.xlsx')

This is all the code so Im wondering why Im getting errors.

This is what the console reports

File "C:/Users/CSR001/PycharmProjects/CLeaner/Base.py", line 30, in <module>
    df.to_excel('TEST.xlsx')
  File "C:\Users\CSR001\PycharmProjects\CLeaner\venv\lib\site-packages\pandas\core\generic.py", line 2257, in to_excel
    engine=engine,
  File "C:\Users\CSR001\PycharmProjects\CLeaner\venv\lib\site-packages\pandas\io\formats\excel.py", line 739, in write
    freeze_panes=freeze_panes,
  File "C:\Users\CSR001\PycharmProjects\CLeaner\venv\lib\site-packages\pandas\io\excel\_openpyxl.py", line 416, in write_cells
    xcell.value, fmt = self._value_with_fmt(cell.val)
  File "C:\Users\CSR001\PycharmProjects\CLeaner\venv\lib\site-packages\openpyxl\cell\cell.py", line 252, in value
    self._bind_value(value)
  File "C:\Users\CSR001\PycharmProjects\CLeaner\venv\lib\site-packages\openpyxl\cell\cell.py", line 205, in _bind_value
    value = self.check_string(value)
  File "C:\Users\CSR001\PycharmProjects\CLeaner\venv\lib\site-packages\openpyxl\cell\cell.py", line 169, in check_string
    raise IllegalCharacterError
openpyxl.utils.exceptions.IllegalCharacterError

Note: The main excel file has some symbols characters in the cells so it could be that causing the errors.

There are cell that have data like this

Ex.

____
&#xC3;&#x192;znur
-
Håkon

Are these affecting how pandas can save?

John Zapanta
  • 179
  • 1
  • 3
  • 14
  • check [How to remove illegal characters so a dataframe can write to Excel](https://stackoverflow.com/questions/42306755/how-to-remove-illegal-characters-so-a-dataframe-can-write-to-excel) – anky Aug 06 '19 at 04:59

3 Answers3

0

The problem is not pandas but with the excel parsing lib. It is 100% an encoding error. Try adding the following parameter:

encoding='utf8'
0

Instead of using read_excel use Parse this will handle the IllegalCharacterError Exception.

import pandas as pd
path = r"C:\Users\uib05928\Desktop\prac1.xlsx"
xls = pd.ExcelFile(path)
df = xls.parse("Sheet1").head()
df.to_excel('C:\Users\uib05928\Desktop\TEST.xlsx')

Output-

output data

ImMJ
  • 206
  • 2
  • 5
0

So i just changed my to_excel into to_csv and it works. I dont know why it works only saving as a csv but it does, any ways, its fine i was planning to convert it into a csv file later on.

df.to_excel('TEST.xlsx')
df.to_csv('TEST.csv')
John Zapanta
  • 179
  • 1
  • 3
  • 14