h2 Database performance issues with Named Query

Question

Just some pre-information. We are using a H2 File Database which is already around 15 GB.

Our Application runs on

Windows Clients
Jetty Webserver
H2 File Database

Every time Data needs to be updated on client side, the user will get a zip File with XML-Files. Either an XML file will be imported to the DB or the xml file has a flag "delete" and the entry in the DB will be deleted. Every import of a zip file has a data version. The import is done manually with Java. XML Files are deserialized to POJOs and mapped to our Domain Entitys.

With this, we are also able to make a full import of all data to the Database (which just takes ages - 8h).

To our issue:

The table where our problem occurs has around 290.000 rows.

The structure is:

We have a named query:

    @NamedNativeQuery(name="getRawTecdocWithMaxVersionAndGivenLocale", 
            query = "select tdo.tecdoc_guid as guid, tdo.tecdoc_locale as locale , tdo.tecdoc_version as version, tdo.data as data "
                    + " from TECDOC_OBJECTS tdo "
                    + " left outer join TECDOC_OBJECTS tdo1 "
                    + " on (tdo.tecdoc_guid = tdo1.tecdoc_guid and tdo.tecdoc_locale = tdo1.tecdoc_locale and tdo.tecdoc_version < tdo1.tecdoc_version) "
                    + " where tdo1.id is null " 
                        +  " and tdo.tecdoc_guid in ( ?1 ) "
                        +  " and tdo.tecdoc_locale = ?2 ",
            resultSetMapping = "rawTecdocs")

Which gets quite slow around 1 sec after a data update (zip file import). The actual query, with given guid did not change after the data update.

We have index on the columns which are selected.

Where it gets strange

If we fill our Database with a full update (all 15GB of data imported through XML), the query seems to be "fast" (20-50 ms) again.

Maybe someone has a hint for me/us to overcome this issue?

Do you have the query plans for both situations? Are they identical? — Sonata, Apr 19 '18 at 08:11
Have you tried running `ANALYZE` on the db after the updates? If the statistics get thrown off by the data being updated, the planner might choose a worse plan than necessary. If the query is fast again after `ANALYZE`, the root cause is the DB statistics. — Kayaman, Apr 19 '18 at 08:24
The query is always the same. I could not do `ANALYZE` as I get the `Error: Syntax error in SQL statement "ANALYZE TABLE TECDOC_OBJECTS SAMPLE_SIZE[*] 0"; expected "integer"; SQL statement:` -> https://stackoverflow.com/questions/44794115/analyze-table-syntax-error — kism3t, Apr 19 '18 at 09:42
Does it work for `ANALYZE TABLE TECDOC_OBJECTS` without specifying the sample size? It's unlikely the whole analyze command is broken. — Kayaman, Apr 19 '18 at 10:40
`Syntax error in SQL statement "ANALYZE TABLE TECDOC_OBJECTS[*]"; expected "integer";` same error here unfortunately. — kism3t, Apr 19 '18 at 10:59
What about plain `ANALYZE`? It will take a while, but if the query runs right after that (and I suspect it does), it's still the statistics. Then you just need to upgrade your H2 and do an analyze on the table after update. — Kayaman, Apr 19 '18 at 11:19
I did `ANALYZE EXPLAIN` for the named query and the execution plan was different. After Dataupdate the plan showd that it searched the whole table and did not use index. I did `ANALYZE` for whole DB (took around 40 Seconds) and the `ANALYZE EXPLAIN` showed now only couple of rows that are searched. Now after each Dataupdate I will execute the `ANALYZE` command. THX — kism3t, Apr 20 '18 at 08:57
Somehow, my problem is still there. `ANALYZE` "breakes" a fast DB on client laptops. I postet a question, which is related just to the h2 part here https://stackoverflow.com/questions/50060278/how-h2-chooses-right-wrong-index-in-join — kism3t, Apr 30 '18 at 07:02

score 0 · Answer 1 · answered Apr 27 '18 at 19:55

Just my two cents: a very personal opinion.

We are using a H2 File Database which is already around 15 GB.

I love H2, yes I do.

Having said that, I personally think every database has its niche, and maybe 15 GB is a little bit over H2's market segment. When you get to the 1 GB mark in H2 you should consider switching to another database. If you like free databases you could start looking seriously at PostgreSQL and MariaDB.

Again, I love H2, but I think you'll start to have more and more performance issues with this level of data.

H2's SQL optimizer is obscure to say the least, and it's difficult to read. Also, it's not quite easy to make it change its mind (to make it switch the plan).

Thanks for your answer, I more often hit the limits of H2. Currently it is not that easy for us to switch DB in the short run. For future I definitely will take it to consideration. I also postet a question plain h2 regarding https://stackoverflow.com/questions/50060278/how-h2-chooses-right-wrong-index-in-join — kism3t, Apr 30 '18 at 07:04

score 0 · Answer 2 · answered May 04 '18 at 06:07

I answered this question, where I asked explicitly a H2 specific question.

I now deleted at the end some combined indexes and now the performance is faster again.

As with ANALYZE on some clients it solved the problem on some it made it (or other parts) worse.

There is an option with USE INDEX, but this is only available after 1.4.194, which also made some other queries very slow or even impossible to execute due to not enough memory.

h2 Database performance issues with Named Query

2 Answers2

Linked