1
  • Python 3.7.0
  • pandas 1.1.5
  • numpy 1.21.4
  • Babel 2.9.1

Problem

I have a dataset that is first evaluated, then completed with calculations and then converted into a pd.MultiIndex by grouping it with the pd.Grouper. The result looks something like this:

                           delta (secs)
identifier   start_dt          
FOO          '2021-11-30'  10738.823
BAR          '2021-11-30'    116.000

And the index looks like this:

MultiIndex([('FOO', '2021-11-30'),
            ('BAR', '2021-11-30')],
           names=['identifier', 'start_dt'])

For better readability, I want to use Babel to properly localize the datetime output in the index columns according to a the locale.

When I try to apply Babel's format_datetime() function to the DateTimeIndex like so:

mi_frame.index = mi_frame.index.set_levels(
    format_datetime(
        mi_frame.index,
        format=time_format,
        tzinfo=get_timezone('Europe/Berlin'),
        locale="de_DE"),
    level=1)

I get the error AttributeError: 'DatetimeIndex' object has no attribute 'replace'. Other methods such as .apply() returned the same error.

The problem lies in the MultiIndex's Index data type (DateTimeIndex), that only allows its own string manipulation / representation method .strftime(time_format).

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
bhthllj
  • 21
  • 6

1 Answers1

1

Thanks to this answer on type conversion by @jezrael from a DateTimeIndex to a Python datetime object, one can pipe it into Babel's functions as they expect a native Python datetime object.

So the solution to my problem is not very elegant, but it works:

  • Cast the DatetimeIndex to a a numpy array of datetime objects
  • Cast to a list to perform a list comprehension
  • Use Babel's format_datetime() to convert the datetime objects into the desired string representations
  • re-assign the list to the MultiIndex index at the previous level (here: level=1) using pandas' MultiIndex.index.set_levels()
dt_array = list(mi_frame.index.levels[1].to_pydatetime())

dt_array = [format_datetime(idx, format="MMM", tzinfo=get_timezone('Europe/Berlin'), locale="de_DE")
            for idx in dt_array]
mi_frame.index = mi_frame.index.set_levels(dt_array, level=1)

Which yields (after some arithmetic added):

                             delta  percentage
identifier   start_dt                       
FOO          Nov.        10738.823         0.4
BAR          Nov.          116.000         0.0

The Nov. results according to the "MMM" format string described in the official Babel documentation.

bhthllj
  • 21
  • 6