0

I have a case with a TSDB Hypertable looking approximately like this:

CREATE TABLE data (
  pool_id INTEGER NOT NULL,
  ts TIMESTAMP NOT NULL,
  noise_err DECIMAL,
  noise_val DECIMAL,
  signal_err DECIMAL,
  signal_val DECIMAL,
  high_val DECIMAL,
  low_val DECIMAL,

  CONSTRAINT data_pid_fk FOREIGN KEY (pool_id) REFERENCES pools (id) ON DELETE CASCADE
);

CREATE UNIQUE INDEX data_pts_idx ON data (pool_id, ts);
SELECT create_hypertable('data', 'ts', 'pool_id', 100);

There are ~1000 pools, data contains >1 year of minute records for every pool, and quite a few analytical queries working with the last 3 to 5 days of data. New data is coming with arbitrary delay: 10ms to 30s depending on the pool.

Now the problem: I need to run analytical queries as fast as possible after the new record has been received, hence I can't insert in batches, and I need to speed up single row insertions.

I've run timescaledb-tune, then turned off synchronous commits (synchronous_commit = off), played with unlogged table mode, and tried to disable the auto vacuum, which didn't help much. The best insert time I get is ~37ms and degrading when concurrent inserts start to 110ms.

What else except removing indexes/constraints can I do to speed up single row inserts?

Daniel
  • 4,272
  • 8
  • 35
  • 48
  • Space partitioning can reduce performance. What is the reason for using the space partitioning with 100 partitions? – k_rus Nov 13 '20 at 07:17
  • @a_horse_with_no_name I'm using PG version 12. – Daniel Nov 13 '20 at 08:29
  • @k_rus the point was to keep as much data in the RAM as possible, so the last week of all pools is always in the RAM. – Daniel Nov 13 '20 at 08:29
  • 1
    Use standard system tools to see where the bottleneck is. `top`, `sar`, `vmstat`. Also, sample the wait_event field of pg_stat_activity. – jjanes Nov 14 '20 at 00:33

2 Answers2

1

First, why use timescaledb for this table in the first place? What are you getting from it that is worth this slowdown?

Second, you have 5200 partitions per year worth of data. That is approaching an unmanageable number of partitions.

jjanes
  • 37,812
  • 5
  • 27
  • 34
  • The project is already using PGSQL as the primary DB and when the need to store time-series data emerged it was a no-brainer to use TSDB. W/o it single writes were taking up to 1 second with concurrent writes. The point of so many partitions was to keep the last week of each pool in the RAM. – Daniel Nov 13 '20 at 08:45
  • 1 s to do a single row insert is grotesque. Do you have a few hundred indexes you haven't told us about? – jjanes Nov 14 '20 at 00:28
  • @Daniel "The point of so many partitions was to keep the last week of each pool in the RAM" How does partitioning on pool_id help with that? – jjanes Nov 14 '20 at 00:29
  • no, the table is shown as is. It was starting when I've had 10+ parallel inserts – Daniel Nov 14 '20 at 15:49
  • my idea was that PG would hold as much as it can in RAM, so I've chopped data that much. – Daniel Nov 14 '20 at 15:50
  • @Daniel But the "last week" is the last week, regardless of what other dimensions you also chop it along. – jjanes Nov 14 '20 at 17:21
1

I question the requirement for analytical queries that need to see the latest split second of data.

Anyway, the way to speed up single row inserts is:

  • Set synchronous_commit to off.

    But be aware that that means data loss of around half a second of committed transactions in the event of a crash! If that is unacceptable, play with commit_siblings and commit_delay; that will also reduce the number of WAL flushes.

  • Use prepared statements. With single row inserts, planning time will be significant.

  • Don't use unlogged tables unless you don't mind if you lose the data after a crash.

  • Don't disable autovacuum.

  • Increase max_wal_size to get no more checkpoints than is healthy.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • As far as I understand, unlogged tables are not supported in timescaleDB (see https://github.com/timescale/timescaledb/issues/836). Do you know a way to enable it ? – J.M. Nov 07 '21 at 10:19
  • @J.M. I don't know. I don't use TimescaleDB. – Laurenz Albe Nov 08 '21 at 06:48