0

I have some csv files and sometime I badly configure the dtype parameter in the pandas.read_csv method so Pandas failed with:

TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

without saying on which column this conversion failed.

How can I retrieve the column's name or index (and maybe first wrong value) of the failure?

PS: I cannot use auto detect / type inference.

Benjamin
  • 3,350
  • 4
  • 24
  • 49
  • Does this answer your question? [Pandas: Location of a row with error](https://stackoverflow.com/questions/26660313/pandas-location-of-a-row-with-error) – ESDAIRIM Nov 25 '20 at 20:51
  • no @ESDARII because in your link he does a conversion of an already existing dataframe. Mine is not yet created as it crash during the read. – Benjamin Nov 25 '20 at 21:39
  • did you try reading the CSV using python's default CSV module, converting it into a dataframe with `dtype=object` and then issuing the call to `.astype`and debugging like what they did in the response? – ESDAIRIM Nov 26 '20 at 14:06

1 Answers1

1

The only way to go is to let pandas read your CSV without imposing a dtype, and then looping over the columns trying to set the correct dtype.

import pandas
import random

# Sample dataset, read yours with
# df = pandas.read_csv("myfile.csv")
df = pandas.DataFrame([{"A": random.randint(0, 100), "B": "test " + str(random.random())} for _ in range(1000)])

# Loop the columns
for column in df.columns:
    try:
        # Cast to the correct type
        df[column] = df[column].astype(int)
    except:
        print("Error trying to set type of column: ", column)
        # Optional: raise the exception here to stop execution
Gijs Wobben
  • 1,974
  • 1
  • 10
  • 13
  • I was hopping not to see that answer appears :-D but as there is not other idea... Maybe one day pandas' team will add a better exception. – Benjamin Nov 30 '20 at 09:28