-3

I am struggling trying convert a dataframe containing multiple datetime columns into a version with a daterange.

A smaller example would look somewhat like this:

So from:

   key  date_upgrade1 date_upgrade2
0  key1 2020-07-31 2020-08-31
1  key2 2020-05-31 NaT

to:

    month       key upgrade1 upgrade2
0   2020-06-01  key1    0   0
1   2020-07-01  key1    0   0
2   2020-08-01  key1    1   0
3   2020-09-01  key1    1   1
4   2020-06-01  key2    1   0
5   2020-07-01  key2    1   0
6   2020-08-01  key2    1   0
7   2020-09-01  key2    1   0

where upgrade1 and upgrade 2 are "0" if the upgrade has not yet been done and "1" if the upgrade was already performed.

Actually, I am quite lost at the moment. I managed to create the empty date_range dataframe. But I am not sure how to duplicate it for multiple keys and change all values after the upgrade date to 1.

Any help is appreciated.

Goliath
  • 1
  • 4
  • 1
    Why does `key2` extend to September when the most current date in the original frame is May 31, which would become June 1? Also, why is `upgrade1` true for key2 for dates July through September? – It_is_Chris Oct 29 '21 at 13:25
  • Good question, I really did not clearly say what "0" and "1" in the second DataFrame actually means. The second DataFrame tells me if an upgrade was already performed. So those upgrades are single time events. – Goliath Oct 29 '21 at 13:33

1 Answers1

0

Answered the question myself in the meantime:

df = [
        df1.assign(month=t)
        for t in pd.date_range(first_date, now, freq='MS')
    ]

df = pd.concat(df)
df.reset_index(drop=True, inplace=True)

df["upgrade1"] = df.month.gt(df.date_upgrade1).astype('int')
df["upgrade2"] = df.month.gt(df.date_upgrade2).astype('int')

with df1 defined as the upper DataFrame.

Goliath
  • 1
  • 4