2

Just some pre-information. We are using a H2 File Database which is already around 15 GB.

Our Application runs on

  • Windows Clients
  • Jetty Webserver
  • H2 File Database

Every time Data needs to be updated on client side, the user will get a zip File with XML-Files. Either an XML file will be imported to the DB or the xml file has a flag "delete" and the entry in the DB will be deleted. Every import of a zip file has a data version. The import is done manually with Java. XML Files are deserialized to POJOs and mapped to our Domain Entitys.

With this, we are also able to make a full import of all data to the Database (which just takes ages - 8h).

To our issue:

The table where our problem occurs has around 290.000 rows.

The structure is: enter image description here

We have a named query:

    @NamedNativeQuery(name="getRawTecdocWithMaxVersionAndGivenLocale", 
            query = "select tdo.tecdoc_guid as guid, tdo.tecdoc_locale as locale , tdo.tecdoc_version as version, tdo.data as data "
                    + " from TECDOC_OBJECTS tdo "
                    + " left outer join TECDOC_OBJECTS tdo1 "
                    + " on (tdo.tecdoc_guid = tdo1.tecdoc_guid and tdo.tecdoc_locale = tdo1.tecdoc_locale and tdo.tecdoc_version < tdo1.tecdoc_version) "
                    + " where tdo1.id is null " 
                        +  " and tdo.tecdoc_guid in ( ?1 ) "
                        +  " and tdo.tecdoc_locale = ?2 ",
            resultSetMapping = "rawTecdocs")

Which gets quite slow around 1 sec after a data update (zip file import). The actual query, with given guid did not change after the data update.

We have index on the columns which are selected.

Where it gets strange

If we fill our Database with a full update (all 15GB of data imported through XML), the query seems to be "fast" (20-50 ms) again.

Maybe someone has a hint for me/us to overcome this issue?

kism3t
  • 1,343
  • 1
  • 14
  • 33
  • 1
    Do you have the query plans for both situations? Are they identical? – Sonata Apr 19 '18 at 08:11
  • 1
    Have you tried running `ANALYZE` on the db after the updates? If the statistics get thrown off by the data being updated, the planner might choose a worse plan than necessary. If the query is fast again after `ANALYZE`, the root cause is the DB statistics. – Kayaman Apr 19 '18 at 08:24
  • The query is always the same. I could not do `ANALYZE` as I get the `Error: Syntax error in SQL statement "ANALYZE TABLE TECDOC_OBJECTS SAMPLE_SIZE[*] 0"; expected "integer"; SQL statement:` -> https://stackoverflow.com/questions/44794115/analyze-table-syntax-error – kism3t Apr 19 '18 at 09:42
  • Does it work for `ANALYZE TABLE TECDOC_OBJECTS` without specifying the sample size? It's unlikely the whole analyze command is broken. – Kayaman Apr 19 '18 at 10:40
  • `Syntax error in SQL statement "ANALYZE TABLE TECDOC_OBJECTS[*]"; expected "integer";` same error here unfortunately. – kism3t Apr 19 '18 at 10:59
  • 1
    What about plain `ANALYZE`? It will take a while, but if the query runs right after that (and I suspect it does), it's still the statistics. Then you just need to upgrade your H2 and do an analyze on the table after update. – Kayaman Apr 19 '18 at 11:19
  • I did `ANALYZE EXPLAIN` for the named query and the execution plan was different. After Dataupdate the plan showd that it searched the whole table and did not use index. I did `ANALYZE` for whole DB (took around 40 Seconds) and the `ANALYZE EXPLAIN` showed now only couple of rows that are searched. Now after each Dataupdate I will execute the `ANALYZE` command. THX – kism3t Apr 20 '18 at 08:57
  • Somehow, my problem is still there. `ANALYZE` "breakes" a fast DB on client laptops. I postet a question, which is related just to the h2 part here https://stackoverflow.com/questions/50060278/how-h2-chooses-right-wrong-index-in-join – kism3t Apr 30 '18 at 07:02

2 Answers2

0

Just my two cents: a very personal opinion.

We are using a H2 File Database which is already around 15 GB.

I love H2, yes I do.

Having said that, I personally think every database has its niche, and maybe 15 GB is a little bit over H2's market segment. When you get to the 1 GB mark in H2 you should consider switching to another database. If you like free databases you could start looking seriously at PostgreSQL and MariaDB.

Again, I love H2, but I think you'll start to have more and more performance issues with this level of data.

H2's SQL optimizer is obscure to say the least, and it's difficult to read. Also, it's not quite easy to make it change its mind (to make it switch the plan).

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • Thanks for your answer, I more often hit the limits of H2. Currently it is not that easy for us to switch DB in the short run. For future I definitely will take it to consideration. I also postet a question plain h2 regarding https://stackoverflow.com/questions/50060278/how-h2-chooses-right-wrong-index-in-join – kism3t Apr 30 '18 at 07:04
0

I answered this question, where I asked explicitly a H2 specific question.

I now deleted at the end some combined indexes and now the performance is faster again.

As with ANALYZE on some clients it solved the problem on some it made it (or other parts) worse.

There is an option with USE INDEX, but this is only available after 1.4.194, which also made some other queries very slow or even impossible to execute due to not enough memory.

kism3t
  • 1,343
  • 1
  • 14
  • 33