-1

I need to create a function which take an input as dictionary and update column values in the dataframe. My data looks following

Date Col_1 Col_2 Col_3 Col_4 Col_5
01/01/2021 10 20 10 20 10
02/01/2021 10 20 10 20 10
03/01/2021 10 20 10 20 10
04/01/2021 10 20 10 20 10
05/01/2021 10 20 10 20 10
06/01/2021 10 20 10 20 10
07/01/2021 10 20 10 20 10
08/01/2021 10 20 10 20 10
09/01/2021 10 20 10 20 10
10/01/2021 10 20 10 20 10
11/01/2021 10 20 10 20 10
12/01/2021 10 20 10 20 10

Now, if pass monthly level update in percentage for 'Col_1' and 'Col_2', say

{Date: ['01/01/2021','02/01/2021','03/01/2021','04/01/2021','05/01/2021','06/01/2021',
        '07/01/2021','08/01/2021','09/01/2021','10/01/2021','11/01/2021','12/01/2021',],
 'Col_1': [20,20,20,20,30,30,40,40,20,20,20,20],
 'Col_2': [0,0,0,0,0,0,0,0,0,0,10,10]}

After performing this my desired out looks like this for monthly change

Date Col_1 Col_2 Col_3 Col_4 Col_5
01/01/2021 12 20 10 20 10
02/01/2021 12 20 10 20 10
03/01/2021 12 20 10 20 10
04/01/2021 12 20 10 20 10
05/01/2021 13 20 10 20 10
06/01/2021 13 20 10 20 10
07/01/2021 14 20 10 20 10
08/01/2021 14 20 10 20 10
09/01/2021 12 20 10 20 10
10/01/2021 12 20 10 20 10
11/01/2021 12 24 10 20 10
12/01/2021 12 24 10 20 10

Similarly, I want to update my data on Quarterly and Yearly Level too. I am able to do the Yearly update, here is my code. Please help me with MONTHY AND QUARTERLY updation based on the inputs.

Thank you!!

dic = {'col_1':10,'col_2':-5)
year = 2021
def update_df(dic,df,year):
    df = df[df['date'].dt.year == year]
    df = (df+df.select_dtypes(include = 'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
    return df

I am trying like this

def update_df(dic,df,year,choice):     
    if choice == annual:         
        df = df[df['date'].dt.year == year]         
        df = (df+df.select_dtypes(include =                 
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]        
    elif choice == quarterly :          
        df["quarter"] = df.date.dt.quarter           
        df = (df+df.select_dtypes(include =                   
        'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
    else choice == monthly : 
        df["month"] = df.date.dt.month           
        df = (df+df.select_dtypes(include =                   
        'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
    return df
Amit Kumar
  • 154
  • 12
  • 1
    This is not a code-writing or tutoring service. We can help solve specific, technical problems, not open-ended requests for code or advice. Please edit your question to show what you have tried so far, and what specific problem you need help with. Please see: [Why is Can someone help me? not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question) for more details. – itprorh66 Feb 11 '22 at 20:52
  • I did tried for annual update in the existing pandas dataframe and it works I wrote it upfront. I just wanna do it for monthly(12 percentage values to be changed in the specified column with timestamp) and quarterly level(4 percentage values to be changed in the specified column with timestamp). p.s. I didn't ask for any tutoring, I need some help only. Have good a day :) – Amit Kumar Feb 11 '22 at 20:55
  • Where is your coding attempt, what specific issue are you having. Please provide a complete relevant [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example "Minimal Reproducible Example") for your question – itprorh66 Feb 11 '22 at 20:57
  • Please post your code in your question. Edit the question and add the code. – itprorh66 Feb 11 '22 at 21:02
  • @itprorh66 does it works for you now? – Amit Kumar Feb 11 '22 at 21:13
  • Okay code is now available, what results do you get and what results do you want? Please follow MRE instructions. Please edit your question to show a minimal reproducible set consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. – itprorh66 Feb 11 '22 at 21:59
  • Take your daily update as an example, **#1** be aware that pandas allows you to multiply a dataframe with another index-by-index. so you may first set `Date` as the index for each of the two dataframes using `set_index`, **#2** you can do column-wise index-by-index multiplication like this `df1['Col_1'] * df2['Col_2']` . If this works for your daily update, change your index from a daily date into a monthly/quarterly/yearly date using the `.dt.xxxx` attributes which you are already using. Update your question with your attempt and error if needed. – Raymond Kwok Feb 12 '22 at 02:45
  • #1 is my monthly update here not daily. #2 I have dictionary to update based on the month, quarter, and yearly. Suppose I have to increase col_2 by -5% for a quarter 1 it should update only col_1 values in quarter 1 rest will remain the same. – Amit Kumar Feb 12 '22 at 03:33
  • Hmm. I came here to help. Its helpful, if you can provide the input code so its easier to duplicate. Its also easier if you show a few columns and a few rows. I see so much data, followed by a code snippet which I am not certain how you are using. I give up because I don't have time to figure out the question much less the answer. – netskink Feb 14 '22 at 21:16

2 Answers2

1

There certainly maybe a cleaner approach but the following will work and provide a single function to do either yearly, quarterly or monthly updates as follows:

import pandas as pd
from collections import namedtuple

# Control tuple defining the date parameters for changing dataframe
DateControl = namedtuple('DateControl', ['Year', 'Quarter', 'Month'])


def updateFrame(df:pd.DataFrame, pcnt_val: float, **args) -> pd.DataFrame:
    # Function to update a specified year, quarter of Month by pcnt_val amount
    dtecol = args.pop('DTECOL', None)
    colList = args.pop('Columns', [])
    control = DateControl(args.pop('Year', None),
                          args.pop('Quarter', None),
                          args.pop('Month', None)
                         )
    
    def EvalDate(ds: pd.Series, row: int, selection: DateControl) -> bool:
        # Evaluate the truth of a date based on control arguments
        yr = False
        qtr = False
        mnth = False
        if selection.Year is None:
            yr = True
        else:
            if ds[row].year == selection.Year:
                yr = True
        if selection.Quarter is None:
            qtr = True
        else:
            if ds[row].quarter == selection.Quarter:
                qtr = True
        if selection.Month is None:
            mnth = True
        else:
            if ds[row].month == selection.Month:
                mnth = True
        return yr and qtr and mnth
    
    # Use control to update all cols named in colList
    mask = list(EvalDate(df[dtecol], x, control) for x in range(len(df[dtecol])))
    mod = list((1.0 + pcnt_val) if x else 1.0 for x in mask)
    print(mask)
    print(mod)
    for c in colList:
         df[c] = list(df.iloc[x][c] * mod[x] for x in range(len(df[c])))     
    return df    

The updateFrame function takes two positional arguments:
. df - the dataframe to be updated
. pcnt_val - the percentage to be added to the current value

The function also takes some keyword variables to include:

  • DTECOL - which is the Name of the df column containing dates
  • Columns - a list of column headings within the Df to be changed
  • Year - a year value or None if all years are to be changed
  • Quarter - a specific quarter integer 1 thru 4 inclusive or None
  • Month - a specific month to be changed or None

Applying this function to your dataframe df as follows:

dg = updateFrame(df, .25, DTECOL='Date', Columns=['Col_1', 'Col_2'], Year=2021, Quarter=3)  

Yields:

    Date    Col_1   Col_2   Col_3   Col_4   Col_5
0   2021-01-01  10.0    20.0    10  20  10
1   2021-02-01  10.0    20.0    10  20  10
2   2021-03-01  10.0    20.0    10  20  10
3   2021-04-01  10.0    20.0    10  20  10
4   2021-05-01  10.0    20.0    10  20  10
5   2021-06-01  10.0    20.0    10  20  10
6   2021-07-01  12.5    25.0    10  20  10
7   2021-08-01  12.5    25.0    10  20  10
8   2021-09-01  12.5    25.0    10  20  10
9   2021-10-01  10.0    20.0    10  20  10
10  2021-11-01  10.0    20.0    10  20  10
11  2021-12-01  10.0    20.0    10  20  10

Given that you would like to provide updates for all 4 quarters in one call this is how I would do it: add new function:

def updateByQuarter(df:pd.DataFrame, changes: list, **args) -> pd.DataFrame:
    #  Given a quarterly change list of the form tuple(qtrid, chgval) Update the dataframe
    for chg in changes:
        args['Quarter'] = chg[0]
        df updateFrame(df, chg[1], **args)
    return df    

Then create a list of changes by quarter

# List of tuples defining the quarter and percent change
qtrChg = [(1, 0.02),(2, 0.035),(3, -0.018),(4, 0.125)]  

The use:

df = updateByQuarter(df, [(1, 0.02), (2, 0.04), (3, -0.02), (4, 0.15)], DTECOL='Date', Columns=['Col_1', 'Col_2'])  

This yields:

         Date  Col_1  Col_2  Col_3  Col_4  Col_5
0  2021-01-01   10.2   20.4     10     20     10
1  2021-02-01   10.2   20.4     10     20     10
2  2021-03-01   10.2   20.4     10     20     10
3  2021-04-01   10.4   20.8     10     20     10
4  2021-05-01   10.4   20.8     10     20     10
5  2021-06-01   10.4   20.8     10     20     10
6  2021-07-01    9.8   19.6     10     20     10
7  2021-08-01    9.8   19.6     10     20     10
8  2021-09-01    9.8   19.6     10     20     10
9  2021-10-01   11.5   23.0     10     20     10
10 2021-11-01   11.5   23.0     10     20     10
11 2021-12-01   11.5   23.0     10     20     10
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
itprorh66
  • 3,110
  • 4
  • 9
  • 21
  • If its monthly than it can take % increase/decrease for maximum of 12 months, similarly for quarters 4 and 1 for monthly. How I can pass this, I didn't understand the code well – Amit Kumar Feb 14 '22 at 10:22
  • To create a negative percent change pass a negative value as ```pcnt_val``` – itprorh66 Feb 14 '22 at 14:05
  • What part don't you understand? – itprorh66 Feb 14 '22 at 14:05
  • I want to pass 4 pcnt_val for each quarter (say q1:2%, q2,:4%, q3:5%,q4:-1). I am getting how to do that. I am new in python and your code seems too complex for me. I tried with this and I am confused lot now – Amit Kumar Feb 14 '22 at 14:24
  • You would run the function 4 different times as follows: ```updateFrame(df, 0.02, 'Date', Columns=['Col_1'], Quarter=1)```, ```updateFrame(df, 0.04, 'Date', Columns=['Col_1'], Quarter=2)```, ```updateFrame(df, 0.035, 'Date', Columns=['Col_1'], Quarter=3)```, and ```updateFrame(df, -0.01, 'Date', Columns=['Col_1'], Quarter=4)``` – itprorh66 Feb 14 '22 at 21:00
  • I wanna make at one go for months, year and quarter – Amit Kumar Feb 15 '22 at 06:41
  • What do you mean? If you want a combination, wrap the function in one of your own that utilizes this function to perform the individual operations. – itprorh66 Feb 15 '22 at 14:08
  • I wanna pass like this if choice == quarter: qtr_wise_per_change = [1,3,5,4] like this – Amit Kumar Feb 16 '22 at 05:01
  • you deleted the last change you made? - @itproh66 – Amit Kumar Feb 18 '22 at 05:57
  • I didn't delete anything, I added to my original answer – itprorh66 Feb 18 '22 at 15:47
0

**pandas captures 4 general time related concepts:

Date times: A specific date and time with timezone support. Similar to datetime.datetime from the standard library.

Time deltas: An absolute time duration. Similar to datetime.timedelta from the standard library.

Time spans: A span of time defined by a point in time and its associated frequency.

Date offsets: A relative time duration that respects calendar arithmetic. Similar to dateutil.relativedelta.relativedelta from the dateutil package.**

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 14 '22 at 19:06