-1

I have a large DataFrame with two columns - start_date and finish_date with dates in string format. f.e. "2018-06-01" I want to create third column with list of months between two dates. So, if I have a start_date - "2018-06-01", finish_date - "2018-08-01", in the third column I expect ["2018-06-01", "2018-07-01", "2018-08-01"]. Day doesn't matter for me, so we can delete it. I find many ways to do it for simple strings, but no one to do it for pandas DataFrame.

  • This may help you to find all months. Just in response code instead of `freq='D'`, you should put `freq='M'`. https://stackoverflow.com/questions/63730078/create-date-range-list-with-pandas – Pooria_T Sep 04 '20 at 14:12

1 Answers1

0

Pandas has a function called apply which allows you to apply logic to every row of a dataframe.

We can use dateutil to get all months between the start and end date, then apply the logic to every row of your dataframe as a new column.

import pandas as pd
import time
import datetime
from dateutil.rrule import rrule, MONTHLY

#Dataframe creation, this is just for the example, use the one you already have created.
data = {'start': datetime.datetime.strptime("10-10-2020", "%d-%m-%Y"), 'end': datetime.datetime.strptime("10-12-2020", "%d-%m-%Y")}
df = pd.DataFrame(data, index=[0])

#df
#       start        end
#0 2020-10-10 2020-12-10

# Find all months between the start and end date, apply to every row in the dataframe. Result is a list.

df['months'] = df.apply(lambda x: [date.strftime("%m/%Y") for date in rrule(MONTHLY, dtstart=x.start, until=x.end)], axis = 1)

#df
#       start        end                       months
#0 2020-10-10 2020-12-10  [10/2020, 11/2020, 12/2020]
PacketLoss
  • 5,561
  • 1
  • 9
  • 27