I am using monetdb on a 16GB Macbook Pro with OSX 10.10.4 Yosemite.
I execute queries with SQLWorkbenchJ (configured with a minimum of 2048M RAM).
I find the performance overall erratic:
- performance is acceptable / good with small size tables (<100K rows)
- abysmal with tables with many rows: a query with a join of two tables (8670 rows and 242K rows) and a simple sum took 1H 20m!!
My 16GB of memory notwithstanding, in one run I never saw MSERVER5
using more than 35MB of RAM, 450MB in another. On the other hand the time is consumed swapping data onto disk (according to Activity Monitor over 160GB of data!).
There are a number of performance-related issues that I would like to understand better:
- I have the impression that MonetDB struggles with understanding how much RAM to use / is available in OSX. How can I "force" MonetDB to use more RAM?
- I use MonetDB through R. The MonetDB.R driver converts all the character fields into CLOB. I wonder if CLOBs create memory allocation issues?
- I find difficult to explain the many GBs of writes (as mentioned >150GB!!) even for index creation or temporary results. On the other hand when I create the DB and load the tables overall the DB is <50MB. Should I create an artificial integer key and set it as
index
? - I join 2 tables on a timestamp field (e.g. "2015/01/01 01:00") that again is seen as a text CLOB by MonetDb / MonetDb.R. Should I just convert it to integer before saving it to MonetDb?
- I have configured each table with a primary key, using a field of type integer. MonetDB (as a typical columnar database) doesn't need the user to specify an index. Is there any other way to improve performance?
Any recommendation is welcome.
For clarity the two tables I join have the following layout:
Calendar # classic calendar table with one entry per our in a year = 8760 rows
Fields: datetime, date, month, weekbyhour, monthbyday, yearbyweek, yearbymonth # all fields are CLOBs as mentioned
Activity # around 200K rows
Fields: company, department, subdepartment, function, subfunction, activityname, activityunits, datetime, duration # all CLOBs except activityunits; datetime refers to when the activity has occurred
I have tied various types of join syntax, but an example would (`*` used for brevity)
select * from Activity as a, Calendar as b where a.datetime=b.datetime