0

I have a bit of an issue deciding how to approach a problem- we are using monetDB in combination with a JAVA jdbc program. I would like some guidance if I am able to solve my problem using SQL queries, if I need to write script, or possibly doing my calculations in JAVA.

We have approximately 3 months of data (~90 tables) each with the date in the table name, each table contains time series trade data simplified like so.

TableMarch1
Time Symbol Vol
9:30 X      100
9:31 Y      200
9:31 X      100
...
TableMarch2
Time Symbol Vol
9:30 X      500
9:31 Y      300
9:31 Y      100
...

My task is to analyze the activity during different periods of the day in 15 minute intervals (i.e. (9:30-9:45, 9:45-10:00). The tables may contain many symbols but we would be analyzing one at a time.

Ideally I would like to aggregate the total volume for each symbol for each time period on each date, but it would also be acceptable to combine the data and calculate the total later:

Result table1(930-945)
Date  Time       Sym Total Volume
March 1 9:30-9:45  X   200
March 1 9:30-9:45  Y   300
March 2 9:30-9:45  X   500
...
Result table 2(945-1000)
March 1 9:45-10:00 X   1000
March 1 9:45-10:00 Y   500

So that eventually i can do statistical calculations for each interval for a single symbol. I am unsure if there is an efficient way to pull the data from all of the dated tables and create new views/tables based on the time slices. I am able to brute force the problem just for testing and query all of these tables on the fly via JDBC- but it is an extremely expensive operation to query data from 90 tables 1 at a time and divide ALL of the data into 15 minute buckets and then do the statistical calculations from there, so I feel like there must be a way to set up a new structure in the database that will streamline getting this data- but I am unsure of a way to do this without manually handling pulling data from each table 1 at a time.

Can i pull data from multiple tables using a single SQL statement somehow to create a view or new table with my desired results (Basically any technique that avoids writing a new query for EVERY SINGLE case and date)? Is there a way to write a case statement that will insert to a different table based on the case? Is there any way to configure monetdb to run these queries more efficiently? Or Do I need to write a script to handle a single table at a time and just pull everything into new tables?

Gadesxion
  • 391
  • 2
  • 6
  • 18
  • 1
    I've never worked with MonetDB, but it seems to me that regardless of whether the underlying storage and retrieval mechanism is column-oriented (like MonetDB) or row-oriented (like the more traditional DBMSs), any analysis across dates is going to be a huge nuisance if your data is fragmented in separate tables for each date. I'm having trouble imagining how consolidating the data into one big table with a date *value* for each "record" (or "observation", or whatever term you choose) could fail to make your life easier. – Gord Thompson Mar 28 '16 at 19:43
  • Gord, I had considered that, however the tables are far too large, there are about 30 columns and ~35 million or ~6GB of records per day . I have combined the tables into weekly tables and reached almost ~200m records, querying a table that large is unfortunately far too costly in terms of memory when querying data from multiple dates even on our hefty 32 core server with 256gb of memory. The eventual goal is to combine the historical analysis with a real time feed of the same data so reducing the latency and cost is the priority even if it involves a more complex solution. – Gadesxion Mar 29 '16 at 19:03

1 Answers1

0

To hide the complexity of your data structure, you could use a combination of SQL views and unions to provide a unified view of your data over time.

The SQL Group By predicate would then allow you to aggregate your data by time slices.

Given your example, you would have a view ViewMarch defined as :

CREATE TABLE TableMarch1(Time timestamp, Symbol integer, value integer);
CREATE TABLE TableMarch2(Time timestamp, Symbol integer, value integer);

CREATE VIEW ViewMarch AS 
    SELECT Time, Symbol, Value FROM TableMarch1 
    UNION 
    SELECT Time, Symbol, Value from TableMarch2;

You could then compute your average symbol value per hour, for example, using a query like :

SELECT LEFT(Time, 13) AS Period, Symbol, AVG(Value) 
    FROM ViewMarch 
    GROUP BY Period, Symbol;

But watch out for the performance cost. I don't know how MonetDB will optimize queries over unions.

Nicolas Riousset
  • 3,447
  • 1
  • 22
  • 25