0

I am experiencing a progressive slowdown of insert rate into a PostgreSQL database after the first few thousand inserts and I wasn't able to find an explanation as to why this occurs. Perhaps somebody can help explain this:

The problem is as follows:

Parse a JSON array into SQL insert statements, into two tables connected by a foreign key, all in a single transaction, where upon error, only the erroneous entry is rolled back (so that there's no orphan data in the connected table).

The slight caveat is that these INSERTS are supposed to be generic (building a tool for dynamically loading user supplied data into a system).

So my solution is to begin a transaction at the beginning of a file and create a savepoint for every entry. If there's an error with that entry, transaction is rolled back to that savepoint (and the savepoint is released), if there's no error, the savepoint is released and the import continues.

Now, this works reasonably well, until there's tens or hundreds of thousands of records to insert. The first few thousand go really well, inserting between 300-400 per second, but then start to progressively slow down.

Done 200, rate 200/s, succeeded 200 failed 0
Done 300, rate 300/s, succeeded 300 failed 0
Done 400, rate 400/s, succeeded 400 failed 0
Done 500, rate 250/s, succeeded 500 failed 0
Done 600, rate 300/s, succeeded 599 failed 1
Done 700, rate 233/s, succeeded 699 failed 1
Done 800, rate 266/s, succeeded 799 failed 1
Done 900, rate 300/s, succeeded 899 failed 1
Done 1000, rate 250/s, succeeded 999 failed 1
Done 1100, rate 275/s, succeeded 1099 failed 1
...
Done 5200, rate 185/s, succeeded 5195 failed 5
Done 5300, rate 182/s, succeeded 5295 failed 5
Done 5400, rate 186/s, succeeded 5395 failed 5
Done 5500, rate 183/s, succeeded 5495 failed 5
...
Done 31000, rate 58/s, succeeded 30953 failed 47
Done 31100, rate 58/s, succeeded 31053 failed 47
Done 31200, rate 57/s, succeeded 31153 failed 47

So after 30.000 inserts it has slowed down to only 1/5 of what it was doing when it started. The tables are very simple tables with a few VARCHARs, few numbers, a primary key and a foreign key. There are no functions, triggers or anything else.

I am wondering if there is something in JDBC that's holding onto resources that are no longer needed which could be causing the issue. Certainly if it starts with 300/sec then the code, network and DB server are capable of supporting at least that.

I know batching would improve it dramatically, but for the use case that I described here, it wouldn't work.

Iv4n
  • 359
  • 1
  • 4
  • 18
  • I guess that FK is slowing thing down (checking if value exists). 1) Do you have index on FK column? 2)Did you consider droping FK and recreating it after load or make it deferred? – Lukasz Szozda May 25 '18 at 07:01
  • The foreign key cannot make inserts progressively slower. I'd check for I/O problems. Perhaps increasing `max_wal_size` and `checkpoint_completion_target` can help if I/O is the problem. You could use [`auto_explain`](https://www.postgresql.org/docs/current/static/auto-explain.html) to get `EXPLAIN (ANALYZE, BUFFERS)` output for the fast and slow queries, maybe you can spot the problem. Are these simple inserts? If not, maybe a periodical `ANALYZE` can help. – Laurenz Albe May 25 '18 at 07:18
  • Dropping all foreign keys proportionally increases the speed, but it starts with 500 inserts per second and slows down to 150 by the time 30.000 records are inserted. – Iv4n May 25 '18 at 07:20
  • I tried playing with max_wal_size and set checkpoint_completion_target to 0.9 but it hasn't changed things much. With 50 objects per second, that's 100 inserts (one for parent table and one for child table), so a single query execution time is 10ms, that's NOT a slow query I imagine – Iv4n May 25 '18 at 07:22

1 Answers1

0

Even though you "release a savepoint", the database keeps memory structures till the end of a transaction. Do you actually commit rows?

  1. You might want to use batch API, and use savepoints before batch statements. For instance: use batch of 100, and if it fails you can retry with one-by-one. Or retry with batch of 50. That enables batch API, reduces the number of savepoints required, allows to skip invalid rows, etc, etc.
  2. You might want to commit transaction from time to time to avoid high memory consumption at the backend side.

If the above does not help, then go ahead and profile the database process (e.g. via perf) to see what is causing the bottleneck.

Vladimir Sitnikov
  • 1,467
  • 12
  • 31