0

I am a newby to the site, have limited scripting skills, but able to pick my way through scripts without a problem. I would like to write a script to monitor the FIX messages coming through a number of log files in real time; segregated by account & symbol. The rate needs to be calculated on a per-minute basis. At the moment I am not sure whether this is a minute by minute calculation or a rolling 60 seconds calculation. I haven't written anything yet, I am just looking to see if this is possible and if anyone can give me some pointers as to what would be the best scripting language to employ. Thanks

  • It would make life easier for anyone who is trying to help you if you posted a sample how your log files look like. – user1666959 Aug 13 '13 at 03:01
  • Sorry, this message was sent at the end of the day yesterday. Please find below an example, I have changed the data to hide anything sensitive: – lonegringo Aug 13 '13 at 12:23
  • Out_Vec__PWKBVSP-LE2__0 [ 601] : timestamp=2013-08-12-13:00:01.235605858 :: latency=1323.3460000000 :: 8=FIX.4.4|9=0253|35=D|34=0000601|52=20130812-13:00:01.235|49=SENDER|56=RECEIVER|57=SOR|50=TRADER|128=SPSE|11=ORDERID1|453=3|448=16|447=D|452=7|448=DMA1|447=D|452=54|448=ABC|447=D|452=36|1=ACCOUNT123|55=LPSB3|54=1|60=20130812-13:00:00.000|38=6400|40=2|44=17.8700|15=BRL|59=0|10=010| :: aux_len=0, – lonegringo Aug 13 '13 at 12:26
  • So basically, I want to count the number of occurences of "35=D" (New order), separated by the value of the account in tag 1 (i.e. ACCOUNT123) and then by the value of the symbol tag 55. Result should read something like: "Account ACCOUNT123, symbol LPSB3, orders 67, between 12:59:01 and 13:00:00" – lonegringo Aug 13 '13 at 12:30
  • Is this on one line? Are there any other lines (ie do I have to filter out other lines)? Can there be any other line with 35=D in it? If you can post say 20 samples spanning a few minutes it would be easier to make sure you get the correct answer. Otherwise it just will be an untested awk script. – user1666959 Aug 13 '13 at 13:01
  • This is one line, yes. 35=D will appear many times throughout the log, but always on separate lines - you cannot have 35=D appearing twice in the same line. It is in fact the number of 35=Ds I want to count, segregated by value in tag 1 and tag 55. Examples to follow... – lonegringo Aug 13 '13 at 13:27
  • Out_Vec__PWKBVSP-LE2__0 [ 867] : timestamp=2013-08-12-13:01:53.402792572 :: latency=82.2370000000 :: 8=FIX.4.4|9=0252|35=D|34=0000867|52=20130812-13:01:53.402|49=SENDER|56=RECEIVER|57=SOR|50=TRADER|128=SPSE|11=ORDER1|453=3|448=16|447=D|452=7|448=DMA1|447=D|452=54|448=BROKERABC|447=D|452=36|1=ACCOUNT123|55=PSSA3|54=2|60=20130812-13:01:53.000|38=600|40=2|44=27.4300|15=BRL|59=0|10=248| :: aux_len=0, – lonegringo Aug 13 '13 at 13:32
  • Out_Vec__PWKBVSP-LE2__0 [ 869] : timestamp=2013-08-12-13:01:54.282318317 :: latency=85.6960000000 :: 8=FIX.4.4|9=0252|35=D|34=0000869|52=20130812-13:01:54.282|49=SENDER|56=RECEIVER|57=SOR|50=TRADER|128=SPSE|11=ORDER2|453=3|448=16|447=D|452=7|448=DMA1|447=D|452=54|448=BROKERABC|447=D|452=36|1=ACCOUNT123|55=PSSA3|54=2|60=20130812-13:01:54.000|38=600|40=2|44=27.4300|15=BRL|59=0|10=003| :: aux_len=0, – lonegringo Aug 13 '13 at 13:33
  • Out_Vec__PWKBVSP-LE2__0 [ 872] : timestamp=2013-08-12-13:01:54.845017165 :: latency=80.4550000000 :: 8=FIX.4.4|9=0253|35=D|34=0000872|52=20130812-13:01:54.845|49=SENDER|56=RECEIVER|57=SOR|50=TRADER|128=SPSE|11=ORDER3|453=3|448=16|447=D|452=7|448=DMA1|447=D|452=54|448=BROKERABC|447=D|452=36|1=ACCOUNT123|55=CPFE3|54=1|60=20130812-13:01:54.000|38=5200|40=2|44=21.3800|15=BRL|59=0|10=026| :: aux_len=0, – lonegringo Aug 13 '13 at 13:34
  • Out_Vec__PWKBVSP-LE2__0 [ 875] : timestamp=2013-08-12-13:01:55.902374101 :: latency=271.3250000000 :: 8=FIX.4.4|9=0252|35=D|34=0000875|52=20130812-13:01:55.902|49=SENDER|56=RECEIVER|57=SOR|50=TRADER|128=SPSE|11=ORDER5|453=3|448=16|447=D|452=7|448=DMA1|447=D|452=54|448=BROKERABC|447=D|452=36|1=ACCOUNT123|55=PSSA3|54=2|60=20130812-13:01:55.000|38=600|40=2|44=27.4300|15=BRL|59=0|10=006| :: aux_len=0, – lonegringo Aug 13 '13 at 13:34
  • Out_Vec__PWKBVSP-LE2__0 [ 881] : timestamp=2013-08-12-13:01:57.125787806 :: latency=82.3420000000 :: 8=FIX.4.4|9=0253|35=D|34=0000881|52=20130812-13:01:57.125|49=SENDER|56=RECEIVER|57=SOR|50=TRADER|128=SPSE|11=ORDER7|453=3|448=16|447=D|452=7|448=DMA1|447=D|452=54|448=BROKERABC|447=D|452=36|1=ACCOUNT123|55=GETI4|54=1|60=20130812-13:01:57.000|38=5400|40=2|44=20.7200|15=BRL|59=0|10=040| :: aux_len=0, – lonegringo Aug 13 '13 at 13:35
  • Does 1=accno always preceeds 55=whatever or can they be the other way around? – user1666959 Aug 13 '13 at 13:38
  • So, assume these were the only messages sent bet6ween 13:01:01 and 13:02:00, result would be something like this: "Account ACCOUNT123, symbol PSSA2, 3 orders; Account ACCOUNT123, symbol CPFE3, 1 orders; Account ACCOUNT123, symbol GETI4, 1 orders". The next step would be to format it nicely, even with a traffic light system to show high numbers if possible. – lonegringo Aug 13 '13 at 13:39
  • the messages are always in the same format (tags in the same sequence). 1 will always be before 55 in the sequence. – lonegringo Aug 13 '13 at 13:46

1 Answers1

0

Here is a brutal solution in gawk. If there is a 35=D on the line we use regexes to split interesting parts out, the timestamp (without the seconds so entries fall into equivalence classes on the minute level), and the two tags and dump it into a 'multidimensional' array, meaning we use these as indices of the array. Once we went through all the messages we scan the array, in no particular order, and dump the counters. It is terribly ugly..the three 'match' functions should be written as one, and perhaps the output sorted, but that's trivial in the shell with 'sort'.

#!/usr/bin/awk -f

#Out_Vec__PWKBVSP-LE2__0 [ 601] : timestamp=2013-08-12-13:00:01.235605858 :: latency=1323.3460000000 :: 8=FIX.4.4|9=0253|35=D|34=0000601|52=20130812-13:00:01.235|49=SENDER|56=RECEIVER|‌​57=SOR|50=TRADER|128=SPSE|11=ORDERID1|453=3|448=16|447=D|452=7|448=DMA1|447=D|452‌​=54|448=ABC|447=D|452=36|1=ACCOUNT123|55=LPSB3|54=1|60=20130812-13:00:00.000|38=6‌​400|40=2|44=17.8700|15=BRL|59=0|10=010| :: aux_len=0,

/35=D/ {
    n=match($0, /.*\|1=([^\|]+)\|.*/, tmp1);
    n=match($0, /.*\|55=([^\|]+)\|.*/, tmp2);
    n=match($0, /[^:]+: timestamp=([[:digit:]]+)-([[:digit:]]+)-([[:digit:]]+)-([[:digit:]]+):([[:digit:]]+).*/, ts);
#    print tmp1[1], tmp2[1], ts[1], ts[2], ts[3], ts[4], ts[5];
    aggr[tmp1[1], tmp2[1], ts[1], ts[2], ts[3], ts[4], ts[5]]++;
}

END {
    for (i in aggr)
    print i, aggr[i];
}

For the samples I get:

ACCOUNT123PSSA3201308121301 3
ACCOUNT123CPFE3201308121301 1
ACCOUNT123LPSB3201308121300 1
ACCOUNT123GETI4201308121301 1

which could be further processed.

user1666959
  • 1,805
  • 12
  • 11
  • Blimey, thanks. Let me see if I can pick my way through this. – lonegringo Aug 13 '13 at 16:46
  • This is awesome, I have modded it to show local time and formatted as I want. The log file to be read is of course being written to in realtime. How would you suggest running this to update realtime with the log? So far I have embedded it within another script and taken the last 1000 lines of the actual log file as an input (I only really need the last minute's worth of data but in times of high message throughput it could well reach 1000/minute). – lonegringo Aug 13 '13 at 18:45
  • You could split the log file on an hourly basis by this script: pipe your original source into this, do the aggregation and dump the original line as well. Or you could keep a file containing a line number N (how may lines you already processed), run this script from cron but skip the first N lines of it by the shells 'tail' command. – user1666959 Aug 14 '13 at 00:16