2

I am a beginner in manipulating files, hence I haven't got a full grasp of it. What I want to do is, by using pandas, create a new file that has all the elements of a previews one listed based on their price in descending order. This is my code:

file = pandas.read_csv('list_of_items.csv', skiprows=1)
sorted_file = file.sort_values(by = 'price', ascending=False)
sorted_file.to_csv('items_sorted_price.csv', index=False)

However I get this error:

File "C:\Users\arcal\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1891, in __init__ 
  self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 529, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 749, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 0: invalid continuation byte

What can I do to solve the problem? Where would you recommend a beginner like me to start working in order to grasp better this topic? Thanks in advance for your help.

Klaiv_Mrt
  • 21
  • 5

2 Answers2

1

There problem is another encoding of data than default. By default pandas.read_csv expects utf-8 encoding. And error says that it can't decode symbol in file using this encoding. So you need to find which one was used for creating that file. I suppose cp1251 as very common.

pd.read_csv('list_of_items.csv', skiprows=1 encoding="cp1251")

About some starting point for pandas - there are a lot of cool tutorials about it. For example from official documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html You can find another just googling pandas 101.

Mikhail_Sam
  • 10,602
  • 11
  • 66
  • 102
  • 1
    I check the Internet for tutorials all the time, however I always ask someone more experienced before deciding. Thanks a lot for your advice, I'll stick with the docs. – Klaiv_Mrt Jan 21 '21 at 08:02
  • @Klaiv_Mrt feel free to ask :) SO created for that – Mikhail_Sam Jan 26 '21 at 10:57
1

When Pandas reads a CSV, it defaults to reading it with utf-8 encoding, however, there are other encoding formats that could be used. The read_csv function can take the encoding format as a parameter.

Here is the code:

df = pd.read_csv('file.csv', encoding = "ISO-8859-1")

There are many different formats you can try, here is full list. I would recommend opening the file with notepad, or another text editor, and then save as a CSV with a utf-8 encoding.


If you only have to read a few csv file, you can use the following code:

df = pd.read_csv('file.csv', engine='python')
DapperDuck
  • 2,728
  • 1
  • 9
  • 21
  • The issue was at the file. When I opened it with notepad, I saw a bunch of gibberish thus the utf-8 encoding couldn't decode it. I just changed the format of the file and it worked just fine. Thank you !!! – Klaiv_Mrt Jan 21 '21 at 07:56