1

The error that I get is "ValueError: month must be in 1..12"

I know that there is "weird" data in that column and I want pandas to treat it as text or ignore the rows with errors and populate the df with the remainder.

(Not a big deal but would be good to avoid having to code an exception for this column in this table when hundreds of others work just fine)

query = 'SELECT troubledDateCol FROM table'
data = pd.read_sql_query(query, cnxn, dtype={'troubledDateCol': pd.StringDtype})

How can I make pandas.read_sql_query ignore the datetime datatype of a column in the source database?

I have tried all sorts to work around this:

  • parse_dates={"troubledDateCol": {"errors": "ignore", "format": "various%formats%that&might&work"}}
  • parse_dates={"troubledDateCol": {"errors": "coerce", "format": "various%formats%that&might%work"}}
  • parse_dates={"troubledDateCol": {"errors": "coerce", "date_parser": myCustomParser"}}
    • Debugging and putting a breakpoint here shows that the error occurs before the custom parser is invoked
  • CASTing and CONVERTing troubledDateCol to VARCHAR in the query
    • This doesn't work because the database is a pernickety, proprietary, flat file DB developed by Sage accounts, based on Pervasive SQL and these fail on execution of the query before pandas tried to load the data
  • Trying to use SQLalchemy and pyOdbc
    • I wasn't able to successfully connect using these despite extensive efforts and found a fair amount of other people having had the same issues
RobD
  • 1,695
  • 2
  • 24
  • 49
  • 1
    It may be easier to just skip parsing dates in the read_sql step and parse the dates once you have the data? – rayad Feb 24 '23 at 18:27
  • Absolutely @rayad this is exactly what I'm trying to do, I want to make read_sql_query "stop trying to parse the dates" because I know that it can't parse them, because the data is dodgy – RobD Feb 24 '23 at 18:33
  • 2
    In that case couldn't you just pass parse_dates=none? Or maybe parse_dates=false maybe? – rayad Feb 25 '23 at 11:37
  • 1
    Hi, what happens with `pd.read_sql_query(query, cnxn, dtype={'troubledDateCol': pd.StringDtype}, parse_dates={'troubledDateCol': {'errors': 'ignore'}})`? – Laurent Feb 25 '23 at 15:51
  • Both parse_dates=False and @Laurent suggestion result in: month must be in 1..12 It seems the database / driver reports the column as DateTime and Pandas insists on treating it as DateTime Might a bug in the driver or something. Could be the data is causing Pandas to fail for "reason x" but the error handling reports it as "error y: month must be in 1..12" because it is an unexpected failure. I'm not able to "see" the problem data, Excel also fails trying to read it, but it fails on 100 rows and reads the rest, this is what I would like my code to do, ignore the errors... but it won't – RobD Mar 01 '23 at 13:30

0 Answers0