-1

I have been trying a code using multiple if statement but this process seems to take more execution time if the number of observations are many. Basically the code checks the value in two columns (rate and rate_tenor) and if condition matches in both the column, then that needs to be replaced by certain value (for e.g. if rate = 'RATE1' and rate_tenor = 1 MONTH, then replace the rate value at that index with '1 MO RATE1'). I have written the code using lots of "if" statement but I am looking for more efficient way that executes faster. There can be many values in rate column, but only those needs to be replaced that have value either RATE1 or RATE2 and particular tenor value(1, 3, 6 or 12 month).

This is the code I wrote:

if 'column1' in columns_in_dataframe:
    tenor = df['column1']
    rate = df['column2']
    # Iterate through each value
    for idx, val in enumerate(rate):

        if val == 'RATE1':
            if tenor[idx] == '1 MONTH':
                tape.loc[idx, 'column2'] = '1 MO RATE1'
            elif tenor[idx] == '3 MONTH':
                tape.loc[idx, 'column2'] = '3 MO RATE1'
            elif tenor[idx] == '6 MONTH':
                tape.loc[idx, 'column2'] = '6 MO RATE1'
            elif tenor[idx] == '12 MONTH':
                tape.loc[idx, 'column2'] = '12 MO RATE1'

        if val == 'RATE2':
            if tenor[idx] == '1 MONTH']:
                tape.loc[idx, 'column2'] = '1 MO RATE2'
            elif tenor[idx] == '3 MONTH']:
                tape.loc[idx, 'column2'] = '3 MO RATE2'
            elif tenor[idx] == '6 MONTH']:
                tape.loc[idx, 'column2'] = '6 MO RATE2'
            elif tenor[idx] == '12 MONTH']:
                tape.loc[idx, 'column2'] = '12 MO RATE2'
Titu
  • 1
  • 3

2 Answers2

1

Are there other possible values in rate? should it always be written in column2? You could maybe find a formula to replace those ifs:

tape.loc[idx, 'column2'] = tenor[idx][:-3] + " " + val

Try adding a description of your df in the question

Tranbi
  • 11,407
  • 6
  • 16
  • 33
0

In your specific case where you want to basically combine two columns, it is best to do this using column-wise operations when possible because this is much faster and you don't need a loop anymore. This is possible in your case, because you basically want to add the columns together, but shorten the tenor string: EDIT: I added conditionals, so that the operation only applies to the rows you want.

conditions = (((df["column1"]=="1 MONTH")|
       (df["column1"]=="3 MONTH")|
       (df["column1"]=="6 MONTH")|
       (df["column1"]=="12 MONTH"))&
       ((df["column2"]=="RATE 1")|
      ( df["column2"]=="RATE 2")))
# New column is the tenor column, without the NTH, a space and the rate column
df.loc[conditions,"column2"] = df.loc[conditions,"column1"].str.replace("NTH","")+ " " + df.loc[conditions,"column2"]

Also, I don't think it is the if-else statements that make your code slow, but the fact that you are looping over pandas objects. Pandas operations should be vectorized if possible. If the simple method above cannot be applied, you can vectorize your code this way:

def rate_tenor(tenor,val):
    if val == 'RATE1':
        if tenor == '1 MONTH':
            return '1 MO RATE1'
        elif tenor == '3 MONTH':
            return '3 MO RATE1'
        elif tenor == '6 MONTH':
            return '6 MO RATE1'
        elif tenor == '12 MONTH':
            return '12 MO RATE1'

    if val == 'RATE2':
        if tenor == '1 MONTH':
            return '1 MO RATE2'
        elif tenor == '3 MONTH':
            return '3 MO RATE2'
        elif tenor == '6 MONTH':
            return '6 MO RATE2'
        elif tenor == '12 MONTH':
            return '12 MO RATE2'
        if tenor == '1 MONTH']:
            return '1 MO RATE2'
        elif tenor == '3 MONTH']:
            return '3 MO RATE2'
        elif tenor == '6 MONTH']:
            return '6 MO RATE2'
        elif tenor == '12 MONTH']:
            return '12 MO RATE2'
# Use apply for speedup
df['column2'] = df.apply(lambda x: rate_tenor(x.column1, x.column2), axis=1)
  • But I don't need to change the entire column, only those needs to be changed where my condition met for both column. – Titu Jul 30 '21 at 12:05
  • Ah, I did not see those conditions. I edited my answer to include them, using .loc to apply the conditions. This should be faster. If it is still to simple a solution to fit your needs, you can speed up your code by putting it in a function and calling .apply, see my second example. – Alfred Rodenboog Jul 30 '21 at 12:36