0

I populate a disk database from large CSV files using TorQ's .loader.loadallfiles in a cumulative fashion and it works great. However, I now need to also append data coming from a streaming source and I'm not sure what's the best way to go.

I know how to update or append data to the in-memory database. However, I do not know what API there is to cosistently bring the delta updates to the disk database previously populated with .loader.loadallfiles?

I call .loader.loadallfiles e.g.

rawdatadir:hsym `$("" sv (getenv[`KDBRAWDATA]; "fwdcurve"));
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`partitiontype!(`date`ccypair`ftype;"ZSS";enlist ",";`fwdcurve;target;`date;`month); rawdatadir];
Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
SkyWalker
  • 13,729
  • 18
  • 91
  • 187
  • when you say "append data from a streaming source", is this coming through e.g. a tickerplant? – Jonathon McMurray Jan 18 '18 at 10:59
  • Hi Jonathon, thank you for asking. No, I will use `qJava` to append data coming from an in-house streaming source and to the in-memory database and then at some point need to have the disk database updated. – SkyWalker Jan 18 '18 at 11:02
  • The typical flow of data into a KDB system would involve it going through the tickerplant, rather than appending directly to an in-memory db. In this setup, you could use the full TorQ stack to maintain RDB (intraday in-memory db) and manage EOD write down to HDB (extending your db loaded from CSVs). Does this sound suitable? – Jonathon McMurray Jan 18 '18 at 11:10
  • Is there a simple example? but if I understand correctly then the RDB will eventually make it to the HDB or? so we are back to the OP ... – SkyWalker Jan 18 '18 at 12:29

1 Answers1

1

The best idea as Jonathon commented is to maintain an RDB for storing the data from your streaming source. When Kdb saves data to disk it saves entire columns in one go, so given 1000 records with 5 columns it is better to ask it to save 5 lists 1000 entries long than to ask it to save 5 columns each with one entry 1000 times.

To illustrate the amount of time this takes, suppose I have two on disk lists x and y. Upserting 10000 elements at once is very fast

q)\t `:x upsert 10000#1
0

Doing them one at a time is much slower

q)\t:10000 `:y upsert 1
126

It might be worth looking into using the full TorQ framework. Its designed specifically for this kind of situation. It has RDB and HDB functionality and can be found here http://aquaqanalytics.github.io/TorQ/

If you wish to append data like you're saying then there currently isn't any API to do that. What you can do is modify the RDB or WDB to write to append to the database. Using .loader.writedatapartition followed by calling .loader.finish will be helpful I think.

  • Hi Ciaran, thank you for the answer! However, I would really appreicate if you could add to your answer how to (at some point) dump the RDB into the HDB that is being populated now with `.loader.loadallfiles`. This would answer my OP ... is there a dedicated API that covers this use-case? – SkyWalker Jan 18 '18 at 12:41
  • Hi Giovanni, I hope this updated answer is helpful to you. – CiaranAquaq Jan 19 '18 at 15:50