Working on Pandas I have performance issue on a step. It is working on a small amount of data, but I can' t have an answer in reasonable time on larger amount.
I have a dataframe like this:
ColA ColB ColC start end
1 1 1 2020-01-01 2021-01-01
with 715K rows like this one and the 5 columns you see, dates are different for each row
I want to change the granularity of the dataframe to have as such rows as there are dates in intervalls
Something like this:
ColA ColB ColC Date
1 1 1 2020-01-01
1 1 1 2020-01-02
[...]
1 1 1 2020-12-31
1 1 1 2021-01-01
As I estimate intervalls are in average composed by 100 dates, I should have something like 71.5M rows at the end
I tried this:
df2= p.DataFrame(columns=['ColA','ColB', 'ColC','DATE'])
for index, row in df1.iterrows():
ColA = row['ColA']
ColB = row['ColB']
ColC = row['ColC']
start_date = p.to_datetime(row['start'])
end_date = p.to_datetime(row['end'])
delta = end_date - start_date
for i in range(delta.days + 1):
day = start_date + timedelta(days=i)
new_row = {'ColA': CoLA, 'ColB': ColB, 'ColC':ColC, 'DATE':day}
df2= df2.append(new_row, ignore_index=True)
but it have has been running for hours without results :(
Do you know how I can do better? Thanks for answers