How can I speed up this pandas dataframe for loop computation?

Question

I have the following dataframe of BTC price for each minute from 2018-01-15 17:01:00 to 2020-10-31 09:59:00, as you can see this is 1,468,379 rows of data, so my code needs to be optimized otherwise computations can take a long time.

dfcondensed = df[["Open","Close","Buy", "Sell"]]
dfcondensed 

Timestamp              Open       Close     Buy Sell            
2018-01-15 17:01:00 14174.00    14185.25    14185.11    NaN
2018-01-15 17:02:00 14185.11    14185.15    NaN NaN
2018-01-15 17:03:00 14185.25    14157.32    NaN NaN
2018-01-15 17:04:00 14177.52    14184.71    NaN NaN
2018-01-15 17:05:00 14185.03    14185.14    NaN NaN
... ... ... ... ...
2020-10-31 09:55:00 13885.00    13908.36    NaN NaN
2020-10-31 09:56:00 13905.38    13915.81    NaN NaN
2020-10-31 09:57:00 13909.02    13936.00    NaN NaN
2020-10-31 09:58:00 13936.00    13920.78    NaN NaN
2020-10-31 09:59:00 13924.56    13907.85    NaN NaN
1468379 rows × 4 columns

The algorithm that I'm trying to run is this:

PnL = []
    for i in range(dfcondensed.shape[0]):
        if str(dfcondensed['Buy'].isnull().values[i]) == "False":
            for j in range(dfcondensed.shape[0]-i):
                if str(dfcondensed['Sell'].isnull().values[i+j]) == "False":
                    PnL.append( ((dfcondensed["Open"].iloc[i+j+1] - dfcondensed["Open"].iloc[i+1]) / dfcondensed["Open"].iloc[i+1]) * 100 )
                    break

Basically, to make it clear, what I'm trying to do is assess the Profit/Loss of buying/selling at the points in the Buy/Sell column. So in the first row the strategy being tested in the dataframe says buy at 14185.11, which was the open price at 2018-01-15 17:02:00, the algrithm should then look for when the strategy tells it to sell and mark this down, then it should look for the time that it's next told to buy and mark this down, then look for the next sell and mark this down, by the end there was over 7,000 different trades, I want to see the profit per trade so I can do some analysis and improve my strategy.

Using the above code to get a PnL list seems to run for a long time and I gave up waiting for it. How can I speed up the algorithm?

Are rows with `NaN` in `Buy` and `Sell` then superfluous? If so, can you filter them out and radically reduce your data? — JonSG, Jul 01 '21 at 16:00
yeah, interesting idea, so I guess I could do something like dfcondensed = dfcondensed.dropna(thresh = 3) and then run this, but then I'd have to take the close of each row rather than the open of the next row, since the next row after wont be the next minute but just the next time it makes a trade. — MoonBoi9001, Jul 01 '21 at 16:31
PnL.append( ((dfcondensed["Sell"].iloc[i+j] - dfcondensed["Buy"].iloc[i]) / dfcondensed["Buy"].iloc[i]) * 100 ) This is a better solution than proposed in my previous comment — MoonBoi9001, Jul 01 '21 at 16:36
This was taking so long because you are iterating over your DataFrame rows, for every row in the DataFrame. It looks like an O(n^2) algorithm and with 1,468,379 rows you're reading (1468379*1468379) rows. Which is enormous. — JimmyNJ, Jul 02 '21 at 13:35

MoonBoi9001 · Answer 1 · 2021-08-06T14:25:38.447

0

I found a way to speed up my loop using list-comprehensions and unrolled loops:

buylist  = df["Buy"]
selllist = df["Sell"]

buylist  = [x for x in buylist if str(x) != 'nan']
selllist = [x for x in selllist if str(x) != 'nan']

profit = []

for i in range(len(selllist)):
    profit.append( (selllist[i] - buylist[i]) / buylist[i] * 100)

edited Aug 06 '21 at 14:25

answered Jul 01 '21 at 20:24

MoonBoi9001

55
6

How can I speed up this pandas dataframe for loop computation?

1 Answers1