0

I am using libpq to connect the Postgres server in c++ code. Postgres server version is 12.10

My table schema is defined below

       Column        |   Type   | Collation | Nullable |  Default   | Storage  | Stats target | Description 
---------------------+----------+-----------+----------+------------+----------+--------------+-------------
 event_id            | bigint   |           | not null |            | plain    |              | 
 event_sec           | integer  |           | not null |            | plain    |              | 
 event_usec          | integer  |           | not null |            | plain    |              | 
 event_op            | smallint |           | not null |            | plain    |              | 
 rd                  | bigint   |           | not null |            | plain    |              | 
 addr                | bigint   |           | not null |            | plain    |              | 
 masklen             | bigint   |           | not null |            | plain    |              | 
 path_id             | bigint   |           |          |            | plain    |              | 
 attribs_tbl_last_id | bigint   |           | not null |            | plain    |              | 
 attribs_tbl_next_id | bigint   |           | not null |            | plain    |              | 
 bgp_id              | bigint   |           | not null |            | plain    |              | 
 last_lbl_stk        | bytea    |           | not null |            | extended |              | 
 next_lbl_stk        | bytea    |           | not null |            | extended |              | 
 last_state          | smallint |           |          |            | plain    |              | 
 next_state          | smallint |           |          |            | plain    |              | 
 pkey                | integer  |           | not null | 1654449420 | plain    |              | 
Partition key: LIST (pkey)
Indexes:
    "event_pkey" PRIMARY KEY, btree (event_id, pkey)
    "event_event_sec_event_usec_idx" btree (event_sec, event_usec)
Partitions: event_spl_1651768781 FOR VALUES IN (1651768781),
            event_spl_1652029140 FOR VALUES IN (1652029140),
            event_spl_1652633760 FOR VALUES IN (1652633760),
            event_spl_1653372439 FOR VALUES IN (1653372439),
            event_spl_1653786420 FOR VALUES IN (1653786420),
            event_spl_1654449420 FOR VALUES IN (1654449420)

When I execute the following query it takes 1 - 2 milliseconds to execute. Time is provided as a parameter to function executing this query, it contains epoche seconds and microseconds.

SELECT event_id FROM event WHERE (event_sec > time.seconds) OR ((event_sec=time.seconds) AND (event_usec>=time.useconds) ORDER BY event_sec, event_usec LIMIT 1

This query is executed every 30 seconds on the same client connection (Which is persistent for weeks). This process runs for weeks, but some time same query starts taking more than 10 minutes.

If I restart the process it recreated connection with the server and now execution time again falls back to 1-2 milliseconds. This issue is intermittent, sometimes it triggers after a week of running process and some time after 2 - 3 weeks of running process.

We add a new partition to table every Sunday and write new data in new partition.

  • Did you check the log files ? That should give you some insights about what is going on when the query gets stuck. Is there any relation with the creation of a new table partition? (locking issues) – Frank Heikens Jun 06 '22 at 10:00
  • Partition is added on Sunday. The issue doesn't trigger every time on Sunday. Sometimes it's Monday or Tuesday. Also on postgres-server it shows CUP usages 100 % for process handling client connection. – Mayank Kandari Jun 06 '22 at 10:19
  • What proces is causing this 100% CPU usage? And what other processes are active at that moment as well? – Frank Heikens Jun 06 '22 at 10:24
  • On postgres client connection we know the remote process id that handles connection on the server-side, for that process CPU usage is 100 %. I ruled out the locking issue, as on the same connection query_time will remain 10 minute always once it hits 10 minutes for the first time. – Mayank Kandari Jun 06 '22 at 10:38
  • set up [auto_explain](https://www.postgresql.org/docs/current/auto-explain.html) to catch the slow query in the act, then show the plan for it being slow. – jjanes Jun 06 '22 at 18:29
  • Dies the connection do anything else but that one query? Does it run in autocommit mode outside of any explicit transaction? – jjanes Jun 06 '22 at 18:31

1 Answers1

1

I don't know why the performance is inconsistent, there are many possibilities we can't distinguish with the info provided. Like, does the plan change when the performance changes, or does the same plan just perform worse?

But your query is not written to take maximal advantage of the index. In my hands it can use the index for ordering, but then it still needs to read and individually skip over things that fail the WHERE clause until it finds the first one which passes. And due to partitioning, I think it is even worse than that, it has to do this read-and-skip until it finds the first one which passes in each partition.

You could rewrite it to do a tuple comparison, which can use the index to determine both the order, and where to start:

SELECT event_id FROM event 
WHERE (event_sec,event_sec) >= (:seconds,:useconds) 
ORDER BY event_sec, event_usec LIMIT 1;  

Now this might also degrade, or might not, or maybe will degrade but still be so fast that it doesn't matter.

jjanes
  • 37,812
  • 5
  • 27
  • 34