2

here the environment: We programmed an application for a customer that is based on the embedded h2 database, it was upgraded to the latest version before doing the tests. The database consists of 29 tables and 26 views. Of the 26 views only 8 are really "used" in java, mapped view hibernate into pojos. The other views are merely doing background calculation for the others, like aggregating some values and then grouping by some column. A lot of calculation is done in those views. We decided against calculating in java since you can easily check the database tables with your favorite tool (e.g. h2 console) the see if there are any errors in your calculations. Due to that fact there are a lot of "CASE WHEN ... END" statements in those views since hibernate always returned whole rows with NULL values in all columns once a single column in that row was NULL. We were never able to put our finger on that problem as well ... Nevertheless, due to the fact we also had divisions in the calculations we needed to check for NULL, 0 and 0.0 anyway. The views are "stacked" since some intermediate values are used someplace else sometimes. but there is always a "stack" of 7 views "below" one final view that also is based on another view that uses a "stack" of 6 views as well. Some of the views are the same some are not.

Now, here comes the problem: When inserting a couple (like 20) of records into the database in the "interesting" tables one view delivers data (4 aggregated rows) in approx. 400ms. That is ok for us. Scaling up the data to approx 500-2000 records that special view (delivering approx. 25 aggregated rows) takes over an hour (1h) to deliver data. The machine is either a Linux with 8GBytes of RAM (-Xmx2G and -Xms1G) CPU 2,66GHz (Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz) or a Windows XP with 4GBytes of RAM (-Xmx1G -Xms512m) CPU unknown but probably a single/dual core @ 2GHz.

My analysis so far: I traced the applications memory usage, that does not seem to be the primary problem. Looking at the stack trace during a long running query revealed that I had a stack depth of (sometimes) up to 100 levels below (!) my entry point into hibernates getEntityManager().createQuery(getCriteriaQuery()).getResultList(). The obvious "time-consumer" is org.h2.table.TableFilter/Table/TableView.getBestPlanItem and org.h2.table.Plan.calculateCost as well as org.h2.index.ViewIndex.getCost . I checked all joins in all views for missing indexes, one found, added, no success.

My Tests: I transferred all data and the schema into a PostgreSQL (8.1) on the same Linux machine (vanilla not tweaked) and ran the tests there (before doing any vaccuum or reindex !) and the result is overwhelming: approx. 6 sec. for the same view with the same data that took about 1h on h2.

Now I do not really want to switch my database but that would be the ultimate option unless anyone has a good idea ...

Remark: On thing I found out is the following: When checking the views in the information_schema of h2, I can see that he is doing quite some work analyzing the views themselves. All views in my sql script a between 20 and 120 rows (approx.). The "compiled" views in the information schema range from 2KBytes to 3MBytes (that is Megabytes) the one from above is nearly 400k ... Maybe this is also a piece of the problem ...

OK, that's all folks. I am graceful for any help. I am willing to switch databases since we are using hibernate and CriteriaQuery all over the place so the only work will be switching the jdbc connector, changing some code in the views (already done but has to be checked twice before production) and the install a PostgreSQL or a MSDE at the customers Desktop PCs (irk) that would lead to other unwanted errors that might happen because MS Updates might leave the MSDE broken or the databases won't start due to whatever reasons ...

Regards, Holger

holle2de
  • 46
  • 3

1 Answers1

1

Maybe the queries / views are simply too complex for H2 to optimize them, but it's hard to say without knowing the details (the code to reproduce the problem). The optimizer of PostgreSQL is better than the H2 optimizer. Possibly you need to create additional indexes. To analyze this, I suggest to read the documentation about performance optimizations and indexes.

Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132