9

I have great problems with Pandas and Excel. I read in an Excel document into a dataframe and that is fine. I do calculations, and all is well. Then I try to save the dataframe to inspect the results from the calculations, and pandas blows up with this error message:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 4: ordinal not in range(128)

This usually indicates that there is some non ascii character in some cells in the dataframe. My usual approach is to drop that column because I just want to analyse the numbers, not strings.

However, which column should I drop? I have no clue. Ideally I would like pandas to tell me which cells are problematic, or which columns are problematic. As of now, pandas dont tell me nothing, so I need to manually inspect every cell to find the culprit. What I do is that I print out the bad character, and manually inspect every cell for that printed character. This is not doable. So I drop all string columns which is not doable either. How to solve this problem? How can I find the cell or column, that gives me problems with bad characters?

I have tried to convert everything to utf-8, sandwiching, etc etc but I can not get it to work. So I need a way to find the problematic cell so I can manually delete the string.

EDIT: Solved! As bdiamante suggested, his link shows a solution to this problem. For future reference, this works:

df.to_csv('file.csv',encoding='utf-8-sig')
bdiamante
  • 15,980
  • 6
  • 40
  • 46
Orvar Korvar
  • 935
  • 2
  • 12
  • 18
  • 2
    How are you trying to save your dataframe? Check [this answer](https://stackoverflow.com/questions/25788037/pandas-df-to-csvfile-csv-encode-utf-8-still-gives-trash-characters-for-min) and see if that does. – bdiamante Jun 19 '17 at 16:17
  • Ok, it works. I dont know how to mark your comment as a solution. If you write a new comment, I can mark your comment as a solution. – Orvar Korvar Jun 22 '17 at 08:35
  • 1
    Just throw Mark Tolonen an upvote for his answer and we're all good :). Glad it worked. – bdiamante Jun 22 '17 at 16:16
  • Ok, Ive done so. But I wanted to thank you as well. I will look at your old answers and upvote some good answer you provided elsewhere. :) – Orvar Korvar Jun 26 '17 at 10:06
  • happens to me when trying to print a df too – thebeancounter Aug 23 '18 at 08:08
  • 1
    Please post the solution as answer, then accept it, do not put it in the question. – ntg Sep 21 '18 at 13:15

1 Answers1

2

Answered within the question, copied here:

df.to_csv('file.csv',encoding='utf-8-sig')
Shovalt
  • 6,407
  • 2
  • 36
  • 51