I am experiencing a progressive slowdown of insert rate into a PostgreSQL database after the first few thousand inserts and I wasn't able to find an explanation as to why this occurs. Perhaps somebody can help explain this:
The problem is as follows:
Parse a JSON array into SQL insert statements, into two tables connected by a foreign key, all in a single transaction, where upon error, only the erroneous entry is rolled back (so that there's no orphan data in the connected table).
The slight caveat is that these INSERTS are supposed to be generic (building a tool for dynamically loading user supplied data into a system).
So my solution is to begin a transaction at the beginning of a file and create a savepoint for every entry. If there's an error with that entry, transaction is rolled back to that savepoint (and the savepoint is released), if there's no error, the savepoint is released and the import continues.
Now, this works reasonably well, until there's tens or hundreds of thousands of records to insert. The first few thousand go really well, inserting between 300-400 per second, but then start to progressively slow down.
Done 200, rate 200/s, succeeded 200 failed 0
Done 300, rate 300/s, succeeded 300 failed 0
Done 400, rate 400/s, succeeded 400 failed 0
Done 500, rate 250/s, succeeded 500 failed 0
Done 600, rate 300/s, succeeded 599 failed 1
Done 700, rate 233/s, succeeded 699 failed 1
Done 800, rate 266/s, succeeded 799 failed 1
Done 900, rate 300/s, succeeded 899 failed 1
Done 1000, rate 250/s, succeeded 999 failed 1
Done 1100, rate 275/s, succeeded 1099 failed 1
...
Done 5200, rate 185/s, succeeded 5195 failed 5
Done 5300, rate 182/s, succeeded 5295 failed 5
Done 5400, rate 186/s, succeeded 5395 failed 5
Done 5500, rate 183/s, succeeded 5495 failed 5
...
Done 31000, rate 58/s, succeeded 30953 failed 47
Done 31100, rate 58/s, succeeded 31053 failed 47
Done 31200, rate 57/s, succeeded 31153 failed 47
So after 30.000 inserts it has slowed down to only 1/5 of what it was doing when it started. The tables are very simple tables with a few VARCHARs, few numbers, a primary key and a foreign key. There are no functions, triggers or anything else.
I am wondering if there is something in JDBC that's holding onto resources that are no longer needed which could be causing the issue. Certainly if it starts with 300/sec then the code, network and DB server are capable of supporting at least that.
I know batching would improve it dramatically, but for the use case that I described here, it wouldn't work.