I'm relatively new to Pandas and after searching here for solutions I have not been able to find anything that translates well to my problem.
I have a DataFrame like this (df_transactions):
col_type col_ticker col_transacted_value
'buy' 'TSLA' '100'
'buy' 'TSLA' '100'
'buy' 'TSLA' '100'
'sell' 'TSLA' '300'
... ... ...
And I would like to create two new columns, with the cumulative previous costs and cumulative costs.
To get the previous cost I have used:
df_transactions[col_prev_cost] = df_transactions.groupby(col_ticker)[col_transacted_value].shift().fillna(0)
but then realized this only finds the previous occurrence of that ticker and populate a new column with it´s transacted_costs.
My idea is too group by ticker and depending on the value of col_type populate a new column 'col_prev_costs', and with that column create the new column 'cml_cost'. Or if you see a simpler way - do share!
I want it to operate like this:
col_type col_ticker col_transacted_value New_col_prev_cost New_col_cml_cost
'buy' 'TSLA' '100' '0' '100'
'buy' 'TSLA' '100' '100' '200'
'buy' 'TSLA' '100' '200' '300'
'sell' 'TSLA' '300' '300' '0'
... ... ... ... ...
I'm open to all suggestions!
Thanks in advance! / Jakob
UPDATE: My first question was somewhat flawed so I will try to explain it again.
I have now discovered that I missed the correct handling of transactions with sell type, as this is dependent on even previous holdings of the ticker.
Let´s say I have the given dataframe:
import pandas as pd
import numpy as np
# Create the dataframe:
data = {'date' : ['2022-01-01', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-04'],
'ticker' : ['TSLA','TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'AAPL', 'AAPL'],
'type' : ['buy', 'buy', 'split', 'sell', 'buy', 'sell', 'buy', 'buy', 'buy'],
'units' : [2, 2, 3, 2, 2, 2, 2, 2.5, 4],
'price' : [50.5, 50.5, 0, 52, 35, 52, 35, 50.5, 50.5],
'fees' : [4, 4, 0, 2.5, 3, 2.5, 3, 1.5, 2],
}
df = pd.DataFrame(data)
df
Out[2]:
date ticker type units price fees
0 2022-01-01 TSLA buy 2.0 50.5 4.0
1 2022-01-02 TSLA buy 2.0 50.5 4.0
2 2022-01-02 TSLA split 3.0 0.0 0.0
3 2022-01-03 TSLA sell 2.0 52.0 2.5
4 2022-01-03 TSLA buy 2.0 35.0 3.0
5 2022-01-03 TSLA sell 2.0 52.0 2.5
6 2022-01-03 TSLA buy 2.0 35.0 3.0
7 2022-01-03 AAPL buy 2.5 50.5 1.5
8 2022-01-04 AAPL buy 4.0 50.5 2.0
Whereas I calculate the following columns:
# Create the column transaction value
def calc_transaction_value(type_transaction, price_unit, transacted_units, fees):
"""
Calculate the transaction value from three columns in a dataframe depending on the value of type.
"""
if type_transaction == 'buy':
return price_unit * transacted_units + fees # i.e. the transacted cost
if type_transaction == 'sell':
return price_unit * transacted_units - fees # i.e. the gross income of capital
else:
return np.nan # If other return filler return NA (i.e. not available)
df['transacted_value'] = df.apply(lambda x: calc_transaction_value(x['type'], x['price'], x['units'], x['fees']), axis=1).fillna(0)
df
Out[3]:
date ticker type units price fees transacted_value
0 2022-01-01 TSLA buy 2.0 50.5 4.0 105.00
1 2022-01-02 TSLA buy 2.0 50.5 4.0 105.00
2 2022-01-02 TSLA split 3.0 0.0 0.0 0.00
3 2022-01-03 TSLA sell 2.0 52.0 2.5 101.50
4 2022-01-03 TSLA buy 2.0 35.0 3.0 73.00
5 2022-01-03 TSLA sell 2.0 52.0 2.5 101.50
6 2022-01-03 TSLA buy 2.0 35.0 3.0 73.00
7 2022-01-03 AAPL buy 2.5 50.5 1.5 127.75
8 2022-01-04 AAPL buy 4.0 50.5 2.0 204.00
# create the flow of units, depends on transaction type (buy or sell)
df["flow_units"] = df.apply(lambda x: -x["units"] if x["type"] == "sell" else x["units"], axis=1)
# Create the cml_units and prev_units column
df = df.groupby("ticker").apply(lambda grp: grp.assign(cml_units=grp["flow_units"].cumsum().abs(),
prev_units=grp["flow_units"].shift(1).cumsum().abs().fillna(0)))
df
Out[4]:
date ticker type units price fees transacted_value flow_units \
0 2022-01-01 TSLA buy 2.0 50.5 4.0 105.00 2.0
1 2022-01-02 TSLA buy 2.0 50.5 4.0 105.00 2.0
2 2022-01-02 TSLA split 3.0 0.0 0.0 0.00 3.0
3 2022-01-03 TSLA sell 2.0 52.0 2.5 101.50 -2.0
4 2022-01-03 TSLA buy 2.0 35.0 3.0 73.00 2.0
5 2022-01-03 TSLA sell 2.0 52.0 2.5 101.50 -2.0
6 2022-01-03 TSLA buy 2.0 35.0 3.0 73.00 2.0
7 2022-01-03 AAPL buy 2.5 50.5 1.5 127.75 2.5
8 2022-01-04 AAPL buy 4.0 50.5 2.0 204.00 4.0
cml_units prev_units
0 2.0 0.0
1 4.0 2.0
2 7.0 4.0
3 5.0 7.0
4 7.0 5.0
5 5.0 7.0
6 7.0 5.0
7 2.5 0.0
8 6.5 2.5
Where this is from the help I got from the previous comment @SRawson.
But here I would like to create the columns prev_costs
, cml_costs
and cost_transaction
. The logic is as follows:
I want to for each row, get the ticker from column "ticker"
, and:
- if transaction type value is equal to
"buy"
in column"type"
then compute the operation"prev_costs + transacted_value"
and populate in column"cml_costs"
.- Where
"prev_costs"
for that row is"cml_costs"
but shifted one row up (if grouped by ticker).
- Where
- if transaction type value is equal to
"sell"
then compute the operation"prev_costs - cost_transaction"
and populate in"cml_costs"
.- Where
"prev_costs"
for that row is"cml_costs"
but shifted one row up (if grouped by ticker). - And where
"cost_transaction"
is the result of operation (for that row):(units/cml_units) * prev_costs
- Where
- else return
"prev_costs"
and populate in"cml_costs"
.
I expect the dataframe:
data_2 = {'prev_costs' : [0, 105, 210, 210, 150, 223, 159.29, 0, 127.75],
'cml_costs' : [105, 210, 210, 150, 223, 159.29, 232.29, 127.75, 331.75],
'cost_transaction' : [0, 0, 0, 60, 0, 63.71, 0, 0, 0],
}
df2 = pd.DataFrame(data_2)
df_expected = pd.concat([df, df2], axis=1, join='inner')
df_expected
Out[5]:
date ticker type units price fees transacted_value flow_units \
0 2022-01-01 TSLA buy 2.0 50.5 4.0 105.00 2.0
1 2022-01-02 TSLA buy 2.0 50.5 4.0 105.00 2.0
2 2022-01-02 TSLA split 3.0 0.0 0.0 0.00 3.0
3 2022-01-03 TSLA sell 2.0 52.0 2.5 101.50 -2.0
4 2022-01-03 TSLA buy 2.0 35.0 3.0 73.00 2.0
5 2022-01-03 TSLA sell 2.0 52.0 2.5 101.50 -2.0
6 2022-01-03 TSLA buy 2.0 35.0 3.0 73.00 2.0
7 2022-01-03 AAPL buy 2.5 50.5 1.5 127.75 2.5
8 2022-01-04 AAPL buy 4.0 50.5 2.0 204.00 4.0
cml_units prev_units prev_costs cml_costs cost_transaction
0 2.0 0.0 0.00 105.00 0.00
1 4.0 2.0 105.00 210.00 0.00
2 7.0 4.0 210.00 210.00 0.00
3 5.0 7.0 210.00 150.00 60.00
4 7.0 5.0 150.00 223.00 0.00
5 5.0 7.0 223.00 159.29 63.71
6 7.0 5.0 159.29 232.29 0.00
7 2.5 0.0 0.00 127.75 0.00
8 6.5 2.5 127.75 331.75 0.00
I was not sure if it is okey to "update" a question in such a major way, but hopes it is. Please comment if this is wrong from my part.
I have made this work in google sheets, but as I said in the first question I'm quite new to pandas and in desperate need for help again.
Thanks in advance (again)!