How to raise an exception when bulk inserting faulty rows into crateDB using Python Pandas to_sql()

Question

I am bulk inserting data into crateDB using

pandas.to_sql(tableId, 'crate://xxxxxxx:4200', if_exists='append', index=False, chunksize=20000)

The data contains a few faulty rows where one column of type 'numeric' contains a value of type 'object' / 'string'.

This results in a typecasting error while inserting.

When I set the chunksize parameter of to_sql() to 1, an exception is raised and I can catch the issue.

When chunksize is > 0, to_sql() just continues and I cannot make sure that data was inserted properly.

What am I missing, and what would be a good approach to fix this problem?

score 0 · Answer 1 · answered Aug 31 '21 at 09:58

0

For bulk inserts you need to inspect the result of the query for error handling. It includes an array of rowcounts. A -2 indicates that an error occurred.

For specific errors, likely including typecasting errors, all rows will be rejected (i.e. all rowcounts return -2) in which case you will need to fall back to single inserts.

See https://crate.io/docs/crate/reference/en/4.6/interfaces/http.html#bulk-errors for additional details.

answered Aug 31 '21 at 09:58

jayeff

1,689
14
14

Thanks. This applies for the http REST API. When using to_sql(), None is always returned, nothing to inspect unfortunately. – Jabb Sep 01 '21 at 11:35

score 0 · Accepted Answer · answered Sep 14 '21 at 17:12

The problem most likely is that pandas doesn't return anything on to_sql() also see https://github.com/pandas-dev/pandas/issues/23998 and https://github.com/pandas-dev/pandas/issues/42645 . Since the bulk_insert in CrateDB doesn't completely fail, but only skips the faulty rows, no exception is raised, but only an array with a value per row is returned. That return however never is processed in pandas :/

One could change the SQL alchemy implementation of CrateDB, but this would in return break any implementation that actually uses the results array.

How to raise an exception when bulk inserting faulty rows into crateDB using Python Pandas to_sql()

2 Answers2