-2

First time posting, newbie to python.

I have a data frame consisting of 3 columns: ['ID', 'date', 'profit_forecast']

'ID': is product ID

'date': start date

'profit_forecast': a list containing 367 items, each item is a profit forecast for date+n

I am looking to create a new data frame that maps each item in profit_forecast to the ID and corresponding date+n for its position in the list.

Not sure how to start.

Thanks in advance!

Mark
  • 934
  • 1
  • 10
  • 25
Kyle
  • 53
  • 5
  • can you post an example of the df – Derek Eden Aug 28 '19 at 00:25
  • first thoughts are to loop over the list and make a dictionary for each date+n based on the count of the index going through the list and the corresponding forecast..then use that dictionary to fill in the rows of the new df using the start date for the key in the dict – Derek Eden Aug 28 '19 at 00:26
  • 1
    It is not clear without the df and the code you tried. – J.K Aug 28 '19 at 02:18

1 Answers1

0

If I understand you correctly, the following example data captures the essence of your question:

df = pd.DataFrame({'ID': [1, 2, 3], 
                   'date': pd.date_range('2019-01-01', freq='YS', periods=3), 
                   'profit_forecast': [[1, 2, 3], [4, 5], [6, 7, 8, 9]]})
df
   ID       date profit_forecast
0   1 2019-01-01       [1, 2, 3]
1   2 2020-01-01          [4, 5]
2   3 2021-01-01    [6, 7, 8, 9]

One solution is to make sure you've upgraded to pandas 0.25, and then to explode the profit_forecast column:

res = df.explode('profit_forecast')
res
   ID       date profit_forecast
0   1 2019-01-01               1
0   1 2019-01-01               2
0   1 2019-01-01               3
1   2 2020-01-01               4
1   2 2020-01-01               5
2   3 2021-01-01               6
2   3 2021-01-01               7
2   3 2021-01-01               8
2   3 2021-01-01               9

At this point, your question is not clear enough on how you need to increment the dates of each ID. If by "date + n" you mean to add one day to each consecutive date within each ID, then something like this should work:

res['date'] = res['date'] + pd.to_timedelta(res.groupby('ID').cumcount(), 'D')
res
   ID       date profit_forecast
0   1 2019-01-01               1
0   1 2019-01-02               2
0   1 2019-01-03               3
1   2 2020-01-01               4
1   2 2020-01-02               5
2   3 2021-01-01               6
2   3 2021-01-02               7
2   3 2021-01-03               8
2   3 2021-01-04               9
Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37