1

I have installed Postgresql 12 with TimescaleDB.

I have two tables in database

  1. Sensor_Data
  2. Sensor_Data_TS (Timescale Hypertable). Both the tables have same columns, same data and indexing is done on sensor_id and time in both the tables.

Now when queries are executed to insert or query data from these two tables to check performance difference, query time is coming almost same. I have inserted around 500 Million records to check the performance difference but sometimes Sensor_Data table is returning data faster than hypertable.

Queries run to create hypertable.

CREATE TABLE Sensor_Data_TS (LIKE Sensor_Data INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);

SELECT create_hypertable('Sensor_Data_TS', 'time', chunk_time_interval => 86400000);

CREATE INDEX ON Sensor_Data_TS (sensor_id, time DESC);

Used "timescaledb-tune" command to configure properties for TimescaleDB.

I want to know if i am doing any mistake in configuring the hypertable ? How can I achieve fast performance with timescaleDB as they claim on their website ?

Hitanshu
  • 11
  • 3
  • Without knowing what queries you are running as your test, there is no way to know if you should expect an improvement in the first place. TimescaleDB is not magic, it does not make *everything* faster. – jjanes Aug 17 '21 at 15:35
  • 1
    I believe timestamps in Postgres are using microseconds. What you have there `chunk_time_interval => 86400000` is not correct I think, didn't you mean `INTERVAL '1 day'` (or 86400000000) – Attila Toth Aug 17 '21 at 15:53
  • What kind of test have you done? Would you mind sharing the code or how you're sending the data load? what is the parallelization level? Please take a look at [tsbs](https://github.com/timescale/tsbs) as it's easy to insert in batches and change the concurrency settings. – jonatasdp Aug 17 '21 at 18:34
  • @jjanes sqlQuery = "INSERT INTO public.sensor_data(sensor_id, time, value) VALUES (?, ?, ?)"; sqlQuery = "INSERT INTO public.sensor_data_ts(sensor_id, time, value) VALUES (?, ?, ?)"; Written a java program to execute these queries in a batch of 10000 records. Both queries took around 3000ms to insert data in db. Time column values are set like 20 Million records per day. Chunk size is of 1 day. I have inserted around 500 Million but still no significant improvement. I am still confused whether to use Timescale in our application or not. – Hitanshu Aug 18 '21 at 05:22
  • @jonatasdp Some properties which have been set in postgresql.conf file after running "timescaledb-tune" shared_buffers = 967073kB # - Asynchronous Behavior - effective_io_concurrency = 200 max_worker_processes = 20 #max_parallel_maintenance_workers = 2 max_parallel_workers_per_gather = 2 #parallel_leader_participation = on max_parallel_workers = 4 #old_snapshot_threshold = -1 #backend_flush_after = 0 Server hardware configuration - CPU(s): 2 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz RAM - 4 GB – Hitanshu Aug 18 '21 at 05:31

1 Answers1

1

What is the insert rate that you achieved?

How many sensor_ids are you emulating?

I'm doing similar tests right now and using tsbs and probably it can give you some insights about it without needing to recreate all the scenarios.

If you query the data you'll see a huge difference because the query planner will have faster parallel interaction over several chunks.

You can also compress your data and segment the compression by sensor_id which will make it really fast to query the data by time and sensor_id.

I'd start trying to also run with different chunk intervals (reduce it) as 20M records per chunk is also a very huge amount of data.

jonatasdp
  • 1,072
  • 6
  • 8