I'm about to start using MonetDB soon, but it's a high fixed cost for switching over from MySQL. The main appeal is compiled in-database R.
Question is: How does MonetDB's memory footprint evolve with WHERE and GROUP BY
Consider the following case
"select firm,yearmonth,R_funct_calculate_something(x,y,z,d,e) FROM monetdb_db.db_table1 WHERE yearmonth between '1999-01-01' and '2010-01-01' group by firm,yearmonth"
It seems like MonetDB OUGHT to read data equivalent to the size of...
(1) [size(x)+size(y)+size(z)+size(d)+size(e)+size(firm)+size(yearmonth)] * group_size
where group size is the size of individual members of firm,yearmonth. I guess in this case bounded at 11years*12months of data.
For me, its obvious that we'll read in only data along the column dimension, but it seems less obvious the row dimension.
(2) Another possibility is instead of group_size, it reads THE WHOLE TABLE into memory.
(3) Another possibility is instead of group_size or the whole table size, it reads the portion of the table that corresponds to the WHERE statement.
Which is it? If its (2) then there's no point for me to switch in the case of very very long datasets, because the whole table being read into memory defeats the point of larger-than-memory data, but I imagine the brilliant minds at MonetDB are doing the smartest thing they can.
Thanks!