My objective is to parallelize reading many (500+) csv-files containing measurement data. To do so I pass a list of paths (source_files) to a synchronous client. Additionally I have specified dtypes, and column names (order_list).
df = dd.read_csv(source_files,
names = order_list,
include_path_column = True,
delimiter = ';',
decimal = '.',
dtype = dtype,
na_values = '.',
assume_missing = True,
error_bad_lines = False
)
df = CLIENT.compute(df).result()
For a corrupt line I get the following error message:
File "pandas\_libs\parsers.pyx", line 1164, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: cannot safely convert passed user dtype of bool for float64 dtyped data in column 116
In rare cases the datalogger messes up writing the log files, causing a float to be where I'd expect a boolean. I am sure that the dtypes I'm passing to read_csv are correct and can be satisfied in a vast majority of the csv-files.
Is there a way to identify which csv-file actually caused the error? It would also be nice to know which line of the given csv-file caused the exception.
Thank you in advance!