0

I am running a postgres query that takes more than two hours. This query is executed using hibernate in a java program. After about 1.5 hours the query stops showing up in the server status in pg_admin.

Since, the query disappeared from the list of active queries on the database, I am expecting a success or a timeout exception. But, I get none.(No exception) and my thread in stuck in the wait state. I know the query has not finished because it was supposed to do some inserts in a table and I cannot find the expected rows in the table.

I am using pgbouncer for the connection pooling and the query_timeout is disabled. Had it been a hibernate timeout I should have got an exception. OS parameters on the DB machine and Client machine(Machine running java program)

  • tcp_keepalive_time is 7200 (seconds)
  • tcp_keepalive_intvl = 75
  • tcp_keepalive_probes = 9 (number of probes) Both the machines run RHEL operating system. I am unable to put my finger on the issue.
Jijo Mathew
  • 322
  • 2
  • 15
  • Side question: is it _really_ necessary to have such big a query? Can't you split it up somehow? Having such long-running queries and transactions normally isn't advisable. – Thomas Feb 03 '16 at 12:46
  • It is a query to delete from a table using the indices. Using Explain, I have confirmed that the indices are being used. The table is huge, which asks for a large number of deletes. That is why so much time is required. It is not about the quality of the query but the disappearance of the query from the list of active queries without any response which I am concerned about. – Jijo Mathew Feb 05 '16 at 05:54
  • Well, sometimes solving the immediate problem isn't the overal right solution. Having more smaller queries might actually be better (maybe even faster). Isn't there a way to issue multiple deletes for subsets of what needs to be deleted? – Thomas Feb 05 '16 at 09:00

2 Answers2

2

I am sure you would have already looked at the following resources:

  1. PostgreSQL Timeout Docs
  2. PgBouncer timeout (you already mention).
  3. Hibernate timeout parameters, if any.

Once that is done, (just like triaging permission issues during a new installation, ) I recommend that you try the following SQL, from different scenarios (given below) and ascertain what is actually causing this timeout:

SELECT pg_sleep(7200);
  1. Login to the server (via psql) and see whether this SQL times-out.
  2. Login to the PgBouncer (again via psql) and see whether PgBouncer times out.
  3. Execute this SQL via Hibernate (via PgBouncer), and see whether there is a timeout.

This should allow you to clearly isolate the cause for this.

Robins Tharakan
  • 2,209
  • 19
  • 17
2

I found that the issue was caused due to the TCP connection getting dropped and the client still hanging waiting for the response.

I altered the following parameters at OS level:-

  • /proc/sys/net/ipv4/tcp_keepalive_time = 2700

Default value was 7200. This causes a keep alive check at every 2700 seconds instead of 7200 seconds.

Jijo Mathew
  • 322
  • 2
  • 15