1

I am trying to read in the following dataset: https://data.opensanctions.org/datasets/20230620/default/names.txt

I have run this code:

filename = "https://data.opensanctions.org/datasets/20230620/default/names.txt"

df = pd.read_csv(filename, encoding='latin1', nrows = 2, header=None)
print(df)

The dataframe looks like this:

                                                   0
0                                SANAVBARI NIKITENKO
1  ÐÐÐÐÐТ Ð ÐÐÐÐÐÐÐÐÐ ÐÐ¥ÐÐÐÐ...

How can I automatically detect the special character types when I read in the file ?

Giampaolo Levorato
  • 1,055
  • 1
  • 8
  • 22

1 Answers1

1

For me working remove encoding='latin1', so is used default encoding='utf-8':

filename = "https://data.opensanctions.org/datasets/20230620/default/names.txt"

df = pd.read_csv(filename, nrows = 2, header=None)
print(df)
                            0
0         SANAVBARI NIKITENKO
1  АМИНАТ РАМЗАНОВНА АХМАДОВА
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252