R + MonetDB - group by memory footprint

Question

I'm about to start using MonetDB soon, but it's a high fixed cost for switching over from MySQL. The main appeal is compiled in-database R.

Question is: How does MonetDB's memory footprint evolve with WHERE and GROUP BY

Consider the following case

"select firm,yearmonth,R_funct_calculate_something(x,y,z,d,e) FROM monetdb_db.db_table1 WHERE yearmonth between '1999-01-01' and '2010-01-01' group by firm,yearmonth"

It seems like MonetDB OUGHT to read data equivalent to the size of...

(1) [size(x)+size(y)+size(z)+size(d)+size(e)+size(firm)+size(yearmonth)] * group_size

where group size is the size of individual members of firm,yearmonth. I guess in this case bounded at 11years*12months of data.

For me, its obvious that we'll read in only data along the column dimension, but it seems less obvious the row dimension.

(2) Another possibility is instead of group_size, it reads THE WHOLE TABLE into memory.

(3) Another possibility is instead of group_size or the whole table size, it reads the portion of the table that corresponds to the WHERE statement.

Which is it? If its (2) then there's no point for me to switch in the case of very very long datasets, because the whole table being read into memory defeats the point of larger-than-memory data, but I imagine the brilliant minds at MonetDB are doing the smartest thing they can.

Thanks!

You can do a `TRACE` or `EXPLAIN` in front of your query to see the columns used and their cardinality / operator execution time. In general, only columns that are used by queries are touched, and in this case the portions of firm,yearmonth,x,y,z,d and e that correspond to the slice of yearmonth will be read. What's the amount of groups here? — Hannes Mühleisen, Oct 12 '15 at 05:55
Thanks for your reply. There are - say - 8000 to 10000 firms. I was *hoping* the answer would be it reads N groups at a time where N is the number of cores MonetDB decides to use, and each group is 121 rows_per_group * column_size — grad student, Oct 12 '15 at 13:56
Would MonetDB read the whole dataset into memory or only read groups? I can imagine with a billion rows reading a billion rows would be infeasible on a group by. — grad student, Oct 16 '15 at 18:54

R + MonetDB - group by memory footprint

0 Answers0