I have the following dataframe of BTC price for each minute from 2018-01-15 17:01:00 to 2020-10-31 09:59:00, as you can see this is 1,468,379 rows of data, so my code needs to be optimized otherwise computations can take a long time.
dfcondensed = df[["Open","Close","Buy", "Sell"]]
dfcondensed
Timestamp Open Close Buy Sell
2018-01-15 17:01:00 14174.00 14185.25 14185.11 NaN
2018-01-15 17:02:00 14185.11 14185.15 NaN NaN
2018-01-15 17:03:00 14185.25 14157.32 NaN NaN
2018-01-15 17:04:00 14177.52 14184.71 NaN NaN
2018-01-15 17:05:00 14185.03 14185.14 NaN NaN
... ... ... ... ...
2020-10-31 09:55:00 13885.00 13908.36 NaN NaN
2020-10-31 09:56:00 13905.38 13915.81 NaN NaN
2020-10-31 09:57:00 13909.02 13936.00 NaN NaN
2020-10-31 09:58:00 13936.00 13920.78 NaN NaN
2020-10-31 09:59:00 13924.56 13907.85 NaN NaN
1468379 rows × 4 columns
The algorithm that I'm trying to run is this:
PnL = []
for i in range(dfcondensed.shape[0]):
if str(dfcondensed['Buy'].isnull().values[i]) == "False":
for j in range(dfcondensed.shape[0]-i):
if str(dfcondensed['Sell'].isnull().values[i+j]) == "False":
PnL.append( ((dfcondensed["Open"].iloc[i+j+1] - dfcondensed["Open"].iloc[i+1]) / dfcondensed["Open"].iloc[i+1]) * 100 )
break
Basically, to make it clear, what I'm trying to do is assess the Profit/Loss of buying/selling at the points in the Buy/Sell column. So in the first row the strategy being tested in the dataframe says buy at 14185.11, which was the open price at 2018-01-15 17:02:00, the algrithm should then look for when the strategy tells it to sell and mark this down, then it should look for the time that it's next told to buy and mark this down, then look for the next sell and mark this down, by the end there was over 7,000 different trades, I want to see the profit per trade so I can do some analysis and improve my strategy.
Using the above code to get a PnL list seems to run for a long time and I gave up waiting for it. How can I speed up the algorithm?