4

Environment:

  • Delphi 2009 client applications (and one Java), running on Windows 2003 server
  • connecting to InterBase 7.5.1 (another Windows 2003 Server) over dbExpress

The Delphi applications log the time to open the TSQLConnection using the AfterConnect event handler of the TSQLConnection object.

In random intervals, the connect need a three-minutes "extra time". I first suspected it could be a problem with the SQL query, but more detailed logging today showed that it is the SQLConnection.Connect which hangs.

I am not sure if this could be a problem with network, the InterBase server, or the Delphi / dbExpress layer.

Has anybody experienced a similar three-minutes "hang"?

p.s. the Java application does not log connect time so I can not say wheter it is affected by this problem.


This phenomenon appeared in the log files since we started with logging in 2012, but the rate has sharply increased last month. The only environment change has been the addition of new Windows servers (for RDP services, Mail, and Fax) so it could be a network-related problem.

mjn
  • 36,362
  • 28
  • 176
  • 378
  • I wouldn't think that DNS would take 3 minutes, but have you tried using an IP Address rather than DNS name? – Jerry Dodge Sep 02 '13 at 13:03
  • @JerryDodge good idea, but the DNS works reliably with the InterBase server name for all other Delphi apps running on computers in the network. Only one computer is affected, and it uses the same DNS. I will check the DNS configuration. – mjn Sep 02 '13 at 13:48
  • @JerryDodge update: now we had the same 3 minute delay logged for a connection from a Delphi app in a Citrix server session to the database. This makes the database server - or its network connection - a "candidate". – mjn Sep 03 '13 at 09:00
  • Make sure everyone connects with the same name. Having some clients use the server name whilst others use localhost or a local connect string can cause similar issues. – Craig Stuntz Sep 04 '13 at 16:37
  • @CraigStuntz the server name and database path is stored in a central configuration file for all Delphi applications. Only one application is Java based. I assume that the InterClient JDBC driver does not modify the connect string "behind the scenes". – mjn Sep 04 '13 at 17:59

3 Answers3

0

Aside of a possible network problem, the cause of the delay can be that, from time to time, your query triggers a garbage collection in one of the table(s) that it is querying.

I don't know in detail Interbase 7.5 internals, but in my experience (with Firebird), this usually happens when a select is made on a table from which many records have been deleted/updated recently.

This doc at IBExpert.net explains it:

A garbage collection is only performed during a database sweep, database backup or when a SELECT query is made on a table (and not by INSERT, ALTER or DELETE). Whenever Firebird/InterBase® touches a row, such as during a SELECT operation, the versioning engine sweeps out any versions of the row where the transaction number is older than the Oldest Interesting Transaction (OIT). This helps to keep the version history small and manageable and also keeps performance reasonable.

A periodic sweep or backup made at low usage hours, can increase performance and minimize the risk of being hitted by an inconvenient garbage collection. See Sweep interval and automated housekeeping (page 6-20) and Facilitating garbage collection (page 11-19) at the Interbase 7.5 Operations Guide for more info on this.

JRL
  • 3,363
  • 24
  • 36
  • Thank you for this suggestion. However it seems to be the time to connect to the server, not the query execution time, which increased: I added OnBeforeConnect and OnAfterConnect event handlers and calculated the time difference. I assume that OnAfterConnect will be executed before the SQL query is sent to the server. – mjn Aug 31 '13 at 09:49
0

Since the rate has increased with the additions of new servers on the network you could have a packet loss and a long timeout to retry. For test that hypothesis you can change the connection timeout to a small value. You also can monitor the network traffic between the servers using wireshark or tcpdump.

Monitoring

To monitor the TCP handshake only you can use:

tcpdump -i eth0 'tcp[13] & 2 = 2

olivecoder
  • 2,858
  • 23
  • 22
  • The Connect does not time out, it only hangs for exactly three minutes and then succeeds witout any error. Monitoring the traffic would be difficult as the error now happens about once a day. – mjn Sep 09 '13 at 11:59
  • Despite the fact you don't get a Connect timeout error the TCP protocol can fail and retry many times before return with a timeout error. – olivecoder Sep 09 '13 at 16:44
0

Please check if hard disk power saving is activated on any disk on mentioned servers. That would explain if you have a delay in first connect and then no delay in following connections. Then, after a while power saving gets activated and the same problem raises again.

avra
  • 3,690
  • 19
  • 19
  • The database server does not use power saving, it is running on a ESX virtual machine system (with hundreds of users). – mjn Sep 09 '13 at 12:00
  • So why not investigate that direction? If you follow some of the links from http://www.google.com/search?q=vmware+esx+3minutes+delay+problem+oracle then you will find that creating a VM clone takes 2-3 minutes, which is exactly the same situation you experience. Check if there is a time match with some schedule or manual cloning of VM for backup purposes. You can also check https://community.emc.com/docs/DOC-24401 and http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002598. – avra Sep 11 '13 at 08:42