0

I have a dataframe with multiple columns, however in this case only three are important: Date (datetime), Country and Value. The dataframe contains time-series data which spans over multipe years. For simplicity, assume all numbers in column 'Value' has an initial value of 2.

df

Date         Country         Value
2020-01-02   UK              2
2020-01-02   USA             2
2020-01-02   AU              2
2020-01-03   UK              2
2020-01-03   USA             2
2020-01-03   AU              2
...

What i want to achieve is to multiply with a factor that changes every month, from a starting value to an ending value. The monthly factor change is calculated as the difference between ending factor and start factor, divided by the number of months.

start_factor = 1.1
end_factor = 1.5
start_date = '2020-02-01'
end_date = '2020-06-01'

I want to multiply the 'Value' column with the correct factor that given month.

Result I want:

Date         Country         Value
2020-01-01   UK              2
2020-01-01   USA             2
...
2020-02-01   UK              2.2
2020-02-01   USA             2.2
...
2020-03-01   UK              2.4
2020-03-01   USA             2.4
...
2020-04-01   UK              2.6
2020-04-01   USA             2.6
...
2020-05-01   UK              2.8
2020-05-01   USA             2.8
...
2020-06-01   UK              3
2020-06-01   USA             3
...
2020-07-01   UK              3
2020-07-01   USA             3
...
2020-08-01   UK              3
2020-08-01   USA             3

What is the best way of achieving this? I've tried making a date range using pd.date_range(start = start_date, end = end_date, freq='M'), looping over it and changing factors with df.loc but I'm not sure its the best way of doing it. Also: how would I do it if I just wanted the factors in a new column?

kibz
  • 1
  • 1

1 Answers1

0

I'm not sure, did I get the question, but here's my suggestion:

import pandas as pd

def create_add_series(start, end, factor):
    start_year, start_month = [int(x) for x in start.split('-')]
    index = pd.date_range(start, end, freq='MS')
    values = [((x.year - start_year) * 12 + x.month - start_month) * factor for x in index]
    return pd.Series(values, index)

series = create_add_series('2020-01', '2020-06', 1.1)

df.Values = [df.loc[x, 'Values'] + series.loc[df.loc[x, 'Date']] for x in df.index]
Yoskutik
  • 1,859
  • 2
  • 17
  • 43
  • This multiplies the factor by the number of months. I want the factor to gradually increase from `start_factor` to `end_factor` over the span of the amount of months. – kibz Sep 10 '20 at 08:09