How to create columns in Python Pandas depending on multiple conditions to calculate previous cost, cumulative cost and cost of transaction?

Question

Let´s say I´ve the given dataframe:

import pandas as pd

import numpy as np

# Create the dataframe:
data = {'date'              : ['2022-01-01', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-04'],
        'ticker'            : ['TSLA','TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'AAPL', 'AAPL'],
        'type'              : ['buy', 'buy', 'split', 'sell', 'buy', 'sell', 'buy', 'buy', 'buy'],
        'units'             : [2, 2, 3, 2, 2, 2, 2, 2.5, 4],
        'price'             : [50.5, 50.5, 0, 52, 35, 52, 35, 50.5, 50.5],
        'fees'              : [4, 4, 0, 2.5, 3, 2.5, 3, 1.5, 2],
}

df = pd.DataFrame(data)
df
Out[3]: 
         date ticker   type  units  price  fees
0  2022-01-01   TSLA    buy    2.0   50.5   4.0
1  2022-01-02   TSLA    buy    2.0   50.5   4.0
2  2022-01-02   TSLA  split    3.0    0.0   0.0
3  2022-01-03   TSLA   sell    2.0   52.0   2.5
4  2022-01-03   TSLA    buy    2.0   35.0   3.0
5  2022-01-03   TSLA   sell    2.0   52.0   2.5
6  2022-01-03   TSLA    buy    2.0   35.0   3.0
7  2022-01-03   AAPL    buy    2.5   50.5   1.5
8  2022-01-04   AAPL    buy    4.0   50.5   2.0

Whereas I calculate the following columns:

# Create the column transaction value

def calc_transaction_value(type_transaction, price_unit, transacted_units, fees):
    """
    Calculate the transaction value from three columns in a dataframe depending on the value of type.
    """
    if type_transaction == 'buy':
        return price_unit * transacted_units + fees # i.e. the transacted cost
    if type_transaction == 'sell':
        return price_unit * transacted_units - fees # i.e. the gross income of capital
    else:
        return np.nan # If other return filler (i.e return NaN)

df['transacted_value'] = df.apply(lambda x: calc_transaction_value(x['type'], x['price'], x['units'], x['fees']), axis=1).fillna(0)

df
Out[4]: 
         date ticker   type  units  price  fees  transacted_value
0  2022-01-01   TSLA    buy    2.0   50.5   4.0            105.00
1  2022-01-02   TSLA    buy    2.0   50.5   4.0            105.00
2  2022-01-02   TSLA  split    3.0    0.0   0.0              0.00
3  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50
4  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00
5  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50
6  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00
7  2022-01-03   AAPL    buy    2.5   50.5   1.5            127.75
8  2022-01-04   AAPL    buy    4.0   50.5   2.0            204.00

And:

# create the flow of units, depends on transaction type (buy or sell)
df["flow_units"] = df.apply(lambda x: -x["units"] if x["type"] == "sell" else x["units"], axis=1)

# Create the cml_units and prev_units column
df = df.groupby("ticker").apply(lambda grp: grp.assign(cml_units=grp["flow_units"].cumsum().abs(),
                                                           prev_units=grp["flow_units"].shift(1).cumsum().abs().fillna(0)))
df
Out[5]: 
         date ticker   type  units  price  fees  transacted_value  flow_units  \
0  2022-01-01   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
1  2022-01-02   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
2  2022-01-02   TSLA  split    3.0    0.0   0.0              0.00         3.0   
3  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
4  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
5  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
6  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
7  2022-01-03   AAPL    buy    2.5   50.5   1.5            127.75         2.5   
8  2022-01-04   AAPL    buy    4.0   50.5   2.0            204.00         4.0   

   cml_units  prev_units  
0        2.0         0.0  
1        4.0         2.0  
2        7.0         4.0  
3        5.0         7.0  
4        7.0         5.0  
5        5.0         7.0  
6        7.0         5.0  
7        2.5         0.0  
8        6.5         2.5

Where the computation of columns for cml_units and prev_units is inspired by @krassowski in thread.

But here I would like to create the columns prev_costs, cml_costs and cost_transaction. The logic is as follows:

I want to for each row, get the ticker from column ticker, and:

if transaction type value is equal to buy in column type then compute the operation prev_costs + transacted_value and populate in column cml_costs.
- Where prev_costs for that row is cml_costs but shifted one row up (if grouped by ticker).
if transaction type value is equal to sell then compute the operation prev_costs - cost_transaction and populate in cml_costs.
- Where prev_costs for that row is cml_costs but shifted one row up (if grouped by ticker).
- And where cost_transaction is the result of operation (for that row): (units/cml_units) * prev_costs
else return prev_costs and populate in cml_costs.

I expect the resulting dataframe as:

data_2 = {'prev_costs'           : [0, 105, 210, 210, 150, 223, 159.29, 0, 127.75],
          'cml_costs'            : [105, 210, 210, 150, 223, 159.29, 232.29, 127.75, 331.75],
          'cost_transaction'     : [0, 0, 0, 60, 0, 63.71, 0, 0, 0],
}

df2 = pd.DataFrame(data_2)

df_expected = pd.concat([df, df2], axis=1, join='inner')

df_expected
Out[6]: 
         date ticker   type  units  price  fees  transacted_value  flow_units  \
0  2022-01-01   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
1  2022-01-02   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
2  2022-01-02   TSLA  split    3.0    0.0   0.0              0.00         3.0   
3  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
4  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
5  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
6  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
7  2022-01-03   AAPL    buy    2.5   50.5   1.5            127.75         2.5   
8  2022-01-04   AAPL    buy    4.0   50.5   2.0            204.00         4.0   

   cml_units  prev_units  prev_costs  cml_costs  cost_transaction  
0        2.0         0.0        0.00     105.00              0.00  
1        4.0         2.0      105.00     210.00              0.00  
2        7.0         4.0      210.00     210.00              0.00  
3        5.0         7.0      210.00     150.00             60.00  
4        7.0         5.0      150.00     223.00              0.00  
5        5.0         7.0      223.00     159.29             63.71  
6        7.0         5.0      159.29     232.29              0.00  
7        2.5         0.0        0.00     127.75              0.00  
8        6.5         2.5      127.75     331.75              0.00

I have made this work in google sheets, but can´t make this work in pandas and in desperate need for help. :)

Thanks in advance!

How to create columns in Python Pandas depending on multiple conditions to calculate previous cost, cumulative cost and cost of transaction?

0 Answers0