I have a large DataFrame with two columns - start_date and finish_date with dates in string format. f.e. "2018-06-01" I want to create third column with list of months between two dates. So, if I have a start_date - "2018-06-01", finish_date - "2018-08-01", in the third column I expect ["2018-06-01", "2018-07-01", "2018-08-01"]. Day doesn't matter for me, so we can delete it. I find many ways to do it for simple strings, but no one to do it for pandas DataFrame.
Asked
Active
Viewed 92 times
1 Answers
0
Pandas
has a function called apply
which allows you to apply logic to every row of a dataframe
.
We can use dateutil
to get all months between the start and end date, then apply the logic to every row of your dataframe
as a new column.
import pandas as pd
import time
import datetime
from dateutil.rrule import rrule, MONTHLY
#Dataframe creation, this is just for the example, use the one you already have created.
data = {'start': datetime.datetime.strptime("10-10-2020", "%d-%m-%Y"), 'end': datetime.datetime.strptime("10-12-2020", "%d-%m-%Y")}
df = pd.DataFrame(data, index=[0])
#df
# start end
#0 2020-10-10 2020-12-10
# Find all months between the start and end date, apply to every row in the dataframe. Result is a list.
df['months'] = df.apply(lambda x: [date.strftime("%m/%Y") for date in rrule(MONTHLY, dtstart=x.start, until=x.end)], axis = 1)
#df
# start end months
#0 2020-10-10 2020-12-10 [10/2020, 11/2020, 12/2020]

PacketLoss
- 5,561
- 1
- 9
- 27