2

Im trying to open the Kaggle spotify top 50 songs csv file using pd.read_csv but its giving below error: "'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte"

In excel, it looks like a normal csv file. I tried using the csv.reader method and im able to open, using a defaultdict, and a few for loops. Why didnt the pd.read_csv work?

link to Kaggle csv file for convenience: https://www.kaggle.com/leonardopena/top50spotify2019

jamest
  • 43
  • 4
  • check this [older post](https://stackoverflow.com/questions/5552555/unicodedecodeerror-invalid-continuation-byte) – dboy May 29 '20 at 20:53

1 Answers1

1

Setting the encoding to latin-1:

import pandas as pd

df = pd.read_csv('datasets_456958_861311_top50contry.csv',
                 encoding='latin-1'
                 )

print(df.head())

Outputs:

   Unnamed: 0                            title          artist  ...  spch  pop country
0           1                     Dance Monkey     Tones and I  ...   9.0  100   world
1           2                          ROXANNE  Arizona Zervas  ...  15.0   99   world
2           3                         Memories        Maroon 5  ...   5.0   99   world
3           4                          Circles     Post Malone  ...   4.0   99   world
4           5  All I Want for Christmas Is You    Mariah Carey  ...   4.0   98   world

[5 rows x 17 columns]
Gustav Rasmussen
  • 3,720
  • 4
  • 23
  • 53
  • thanks Gustav. any insight as to appropriate time to use csv.reader over pd.read_csv (can do everything in one line of code)? – jamest Jun 01 '20 at 12:42