Python - aggregating financial transactions with closest time in window

Question

Suppose I collected (in a list) all trades that occurred within a certain period of time (say first 5mins after 11AM) for n stocks (I'll make n=2 for simplicity and adapt later). Say we have firm AAA and firm BBB (if it helps, liststocks=['AAA', 'BBB']). The list will look somehow like:

    trades=[['AAA', '2011-01-03', '11:03:51', 21.5],['BBB', '2011-01-03','11:03:57', 31.5],
['AAA', '2011-01-03', '11:04:20', 21.55],
['BBB', '2011-01-03','11:04:19', 32.01], ['BBB', '2011-01-03','11:04:52', 31.7]]

i.e., 2 trades for stock AAA and 3 trades for stock BBB. Picking the last trade of each stock causes a lack of synchronicity problem. The idea is to pick the last trade of each stock and find the earliest (['AAA', '2011-01-03', '11:04:20', 21.55]). Then pick transactions of all other stocks with time as close as possible to '11:04:20', which would cause us to choose ['BBB', '2011-01-03','11:04:19', 32.01]. The output should be a list like:

    C=[['AAA', '2011-01-03', '11:04:20', 21.55],['BBB', '2011-01-03','11:04:19', 32.01]]

Thanks a lot!

Alex, last liquid is the stock with fewer trades. in my case, I already selected the last trade of AAA in parenthesis. — Eduardo Hinterholz, Jun 08 '15 at 02:41
Hey @Eduardo, looks like you're still kind of new to SO. Two quick things. 1.) Your question doesn't have a clear question as to what you're having issues with. My guess is that you want a way to get the intended output, but you never really say that anywhere. 2.) You don't show any attempt of yourself trying to solve the problem. It kind of seems that you have a problem that you want someone else to solve. Users of SO like helping out but we don't like doing someone's work for them, which is how the question comes off. This should help you out http://stackoverflow.com/help/how-to-ask. — Austin A, Jun 08 '15 at 02:42
What assumptions about the data can we make? Is the list of each firm's trades sequential everytime? If it isn't, you're search algoritm's time complexity will apprach O(nm) [n=# firms, m=avg number of trades]. This won't be good for you if you're working with large numbers! — Austin A, Jun 08 '15 at 02:48
let's put it in steps: 1) The list is sorted by time at each given day 2) Pick the time of the last transaction of the stock with fewer trades in that window of time; Append the whole transaction in a new list (say C ) 3) Look at the transactions of other stocks that traded close to this hour; append in C — Eduardo Hinterholz, Jun 08 '15 at 02:53
As for attempts, I don't think it would be of any help if I put the fails...it would add useless lines at the question — Eduardo Hinterholz, Jun 08 '15 at 02:58

Maltysen · Answer 1 · 2015-06-08T03:07:43.327

Its not that hard if you use sorted with the key parameter.

Here is the code if you don't want to read, I'll explain after:

from datetime import datetime

trades=[['AAA', '2011-01-03', '11:03:51', 21.5],['BBB', '2011-01-03','11:03:57', 31.5],
['AAA', '2011-01-03', '11:04:20', 21.55],
['BBB', '2011-01-03','11:04:19', 32.01], ['BBB', '2011-01-03','11:04:52', 31.7]]

trades=[[i[0], datetime.strptime(i[1]+" "+i[2], "%Y-%m-%d %H:%M:%S"), i[3]] for i in trades]

most_liquid, *others, least_liquid = sorted(set(i[0] for i in trades), key=trades.count)

A=sorted((i for i in trades if i[0]==least_liquid), key=lambda n: n[1])[-1]
B=sorted((i for i in trades if i[0]==most_liquid), key=lambda n: abs(n[1]-A[1]))[0]

What this does is it first converts each trade from using a string representation of the time to a datetime object. It does this with the datetime.strptime class method. Then it calculates the liquidity of the stocks by sorting trades. The *others generalizes to n stocks. Then it just filters the less_liquid trades and then sorts them by the time parameter. Then it filters by name being more_liquid and sorts by the absolute difference between it and the A trade.

So the object you want are A and B. They won't be exactly what you specified, since they will have datetime instead of strings, but that should be easy to fix with the datetime.strftime function.

to generalise for `>2` stocks: `sortedTrades = sorted(set(i[0] for i in trades), key=trades.count) ; most_liquid, least_liquid = sortedTrades[0], sortedTrades[-1]` — Alexander McFarlane, Jun 08 '15 at 03:04
It seems that running sorted(set(i[0] for i in trades), key=trades.count) several times gives different answers each time. The key argument is not related to the set described, which is causing the problem — Eduardo Hinterholz, Jun 08 '15 at 04:37

score 0 · Answer 2 · answered Jun 08 '15 at 05:36

A solution for 2 stocks is

    from datetime import *
    trades=[['AAA', '2011-01-03', '11:03:51', 21.5],['BBB', '2011-01-03','11:03:57', 31.5],
['AAA', '2011-01-03', '11:04:20', 21.55],
['BBB', '2011-01-03','11:04:19', 32.01], ['BBB', '2011-01-03','11:04:52', 31.7]]

stocknames = ['AAA','BBB']
A=[]
lastofeach=[]
for stock in stocknames:
    for t in trades:
        if t[0]==stock:
            A.append(t)
    A.sort(key=lambda e:(e[1], e[2]))
    lastofeach.append(A[-1])
    A[:]=[]
lastofeach.sort(key=lambda e:e[2])  

lastofeach=[[i[0], datetime.strptime(i[1]+" "+i[2], "%Y-%m-%d %H:%M:%S"), i[3]] for i in lastofeach]
trades=[[i[0], datetime.strptime(i[1]+" "+i[2], "%Y-%m-%d %H:%M:%S"), i[3]] for i in trades]


A=lastofeach[0]
B=(sorted((i for i in trades if i[0]!=A[0]), key=lambda n: abs(n[1]-A[1]))[0])
C=[A,B]
print (C)

(for the exact same answer just apply datetime.strftime(A[1],"%Y-%m-%d %H:%M:%S"), split it and save.

A solution for n>2 would be very much appreciated. Any big O optimizations are welcome too.

Thanks

Python - aggregating financial transactions with closest time in window

2 Answers2