2

I try reading a CSV file using pandas and get a warning I do not understand:

Lib\site-packages\dateutil\parser\_parser.py:1207: UnknownTimezoneWarning: tzname B identified but not understood.  Pass `tzinfos` argument in order to correctly return a timezone-aware datetime.  In a future version, this will raise an exception.
  warnings.warn("tzname {tzname} identified but not understood.  "

I do nothing special, just pd.read_csv with parse_dates=True. I see no B that looks like a timezone anywhere in my data. What does the warning mean?

A minimal reproducible example is the following:

import io
import pandas as pd
pd.read_csv(io.StringIO('x\n1A2B'), index_col=0, parse_dates=True)

Why does pandas think 1A2B is a datetime?!

To solve this, I tried adding dtype={'x': str} to force the column into a string. But I keep getting the warning regardless...

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Michel de Ruiter
  • 7,131
  • 5
  • 49
  • 74
  • Convenience is not always clever. You might want to consider not parsing dates during read_csv, or at least specify which columns to parse. Keyword `parse_dates` alternatively takes a list or dict to specify that ([docs](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)) – FObersteiner Jan 11 '23 at 10:30

1 Answers1

1

It turns out 1A2B is being interpreted as "1 AM on day 2 of the current month, timezone B". By default, read_csv uses dateutil to detect datetime values (date_parser=):

import dateutil.parser
dateutil.parser.parse('1A2B')

Apart from the warning, this returns (today):

datetime.datetime(2023, 1, 2, 1, 0)

And B is not a valid timezone specifier indeed.

Why adding dtype doesn't help stays to be investigated.

I did find a simple hack that works:

import dateutil.parser
def dateparse(self, timestr, default=None, ignoretz=False, tzinfos=None, **kwargs):
    return self._parse(timestr, **kwargs)
dateutil.parser.parser.parse = dateparse  # Monkey patch; hack!

This prevents using the current day/month/year as defaults, rendering the value invalid as a datetime as expected.

Michel de Ruiter
  • 7,131
  • 5
  • 49
  • 74