8

Googling this ADO error message indicates that it is commonly encountered in ASP.NET development, but I have not found much mention of when it occurs in Delphi applications. We have some customer sites which are experiencing transient network problems, and this is the symptomatic error message. We can duplicate it in office testing easily; Just shut down an MS SQL Server service while your delphi TADOConnection object is connected to a database on that server instance and you get this exception:

   [DBNETLIB][ConnectionWrite (send()).]General network error. Check your network documentation.

Yes, catch this exception, and you know (or do you?) that this error has occurred. Except that this is an 800 KLOC+ application with over 10,000 try-except blocks around database actions, any one of which might fail with this error.

TADOConnection has some error events, none of which fire in this case. However, the ADO Connection itself is faulted once this occurs, even if you restart the SQL database, TADOConnection.Connected remains true, but it's lying to you. It's really in a faulted state.

So then, my question is:

Can you detect this faulted state, and recover from it, in any way that is less work than going into 10,000 individual try-except blocks and setting some global "reconnect ADO global variable"?

I am hoping there is a way to go into TADOConnection.ConnectionObject (the underlying raw OLEDB COM ADO object) and detect this fault condition exists when we are starting a new query, so that we can reset the ADOConnection and continue the next time we run a query. Since our code is organized in a way that would allow us to detect this "after the failure" much more easily than it would allow us to do it the way I would do this in a 10 line demo application.

This other SO question asks why it happens, that is not what I'm asking, please don't give me "prevention" answers, I know about them already, I'm looking for a recovery and detection-of-stalled-ADO-connection technique other than catching the exceptions. In fact, this is a good example of exceptions gone wrong; ADO is a schrodingers-cat object in this failure mode.

I am aware of the MS Knowledgebase articles, and the various solutions floating around the internet. I'm asking about RECOVERING without losing customer data, once the error condition (which is often transient in our situations) has cleared. That means we freeze our app, show the exception to the customer, and when the customer clicks Retry or Continue, we attempt to repair and continue. note that our existing code does a million try-except-log-and-continue code, that is going to get in our way, so I'm expecting someone to answer that an Application handler for unhandled exceptions is the best way, but sadly we can't use it. I really hope however that it is possible to detect a frozen/faulted/dead ADO connection object.

Here's what I have:

try
  if fQueryEnable and ADOConnection1.Connected then begin
    qQueryTest1.Active := false;
    qQueryTest1.Active := true;
    Inc(FQryCounter);
    Label2.Caption := IntToStr(qQueryTest1.RecordCount)+' records';

  end;
except
      on E:Exception do begin
         fQueryEnable := false;
         Memo1.Lines.Add(E.ClassName+' '+E.Message);
         if E is EOleException and Pos('DBNETLIB',E.Message)>0 then begin
            ADOConnectionFaulted := boolean; { Global variable. }
         end;
         raise;
      end;
end;

The problem with the above solution is that I need to copy and paste it about 10,000 places in my application.

Community
  • 1
  • 1
Warren P
  • 65,725
  • 40
  • 181
  • 316
  • Something like a TTimer and a simple query like SELECT @@VERSION and global state variable? – whosrdaddy May 25 '12 at 14:21
  • Never had to face this problem as I have made a wrapper for all my apps which handles these kind of situations. Kinda sucks if you need to change 10k locations :(. – whosrdaddy May 25 '12 at 14:27
  • Yeah, an "ADO watchdog." And yet another evil TTimer. :-) – Warren P May 25 '12 at 15:22
  • Of course it must be handled by your custom DAL wrappers because there are much more cases of "stale" or terminated connection. The errors we are handling are (at least) *Unspecified error*, *General network error* and *Connection failure* depending on the underlying OLEDB provider (SQLOLEDB, SQLNCLI, ODBC {SQL Server}, ODBC {SQL Server Native Client 10.0}) – wqw May 25 '12 at 15:26
  • @WarrenP: http://stackoverflow.com/a/2291917/800214 – whosrdaddy May 26 '12 at 08:00
  • Isn't the ADOConnection.OnInfoMessage the supposed event to catch all exceptions (through the Error interface)? P.S.: What kind of programmers inserts such message "General network error. Check your network documentation." in his code? WTH! – jfoliveira May 27 '12 at 13:11
  • @Jeferson: A. General network error causes NO exceptions at the ADO Connection level. B. Microsoft does. :-) – Warren P May 27 '12 at 18:26
  • The linked question from whosrdaddy is almost exactly the same idea, except that its linked answer won't work in my case. (http://stackoverflow.com/questions/2289260/how-can-i-detect-that-a-tadoconnection-lost-the-communication-with-the-server/2291917#2291917) – Warren P May 27 '12 at 18:27

3 Answers3

9

Well nobody has answered this question, and I think that some follow-up would be helpful.

Here is what I have learned:

  • There are no reliable situations where in a test environment you can reproduce this General Network Error. That is to say, we're dealing with Irreproducible Results, which is where many developers leap into evil hackery in an attempt to "monkeypatch" their broken systems.

  • Fixing the underlying fault has always and everywhere been better than fixing it in code, when the SQL library gives a "General Network Error". No repair has ever been shown to be possible, because usually it means "the network is so unreliable that TCP itself has given up on delivering my data", this happens when:

    • You have a bad network cable.

    • You have duplicate IP addresses on a network.

    • You have dueling DHCP servers each handling out different default gateways.

    • You have local ethernet segments that have poor connectivity between them.

    • You have an ethernet switch or hub which is failing.

    • You are being intermittently blocked by a malfunctioning firewall.

    • Your customer may have changed something on their network, and may now be unable to use your software. (This last one actually happens more than you might think)

    • Someone may have configured an SQL Alias using cliconfg or other client side configuration elements that are specific to a single workstation's registry settings, and this local configuration may result in bad behaviour that is difficult to diagnose and may be limited to one or several workstations on a large network.

None of the above can be detected and reported either at the TCP or SQL level. When SQL finally gives up, and it gives this "General Network Error", no amount of cajoling from my software is going to get it to un-give-up, and even if it did, I would be doing a "try/except/ignore" antipattern. This error is so severe that we should raise it all the way up to the user, log it to disk in an error log, give up (quit the program), and tell the user that the network connection is down.

Warren P
  • 65,725
  • 40
  • 181
  • 316
  • 1
    Another place where this happens is where Windows has TCP/IP V4 and TCP/IP V6 both enabled on one computer in a local area network, and does not have IP V6 enabled on the other side of the network, and yet somehow LLMNR or something else in Windows still tries to use IP V6 when it shouldn't. Weirdness ensues. – Warren P Jun 05 '13 at 15:12
  • 1
    +1. Note that in the mobile world it is much more common to automatically retry (usually using an exponential backoff: https://www.google.com/search?q=connection+retry+algorithm) because wireless networks (WiFi, G3, BT) are so unreliable. But in both environments eventually you have to ask a user what to do (if it is not a server process). – Jeroen Wiert Pluimers Jul 19 '13 at 15:07
  • Really late comment here, but [this answer](http://stackoverflow.com/questions/3211625/general-network-error-after-a-night-of-inactivity/3211950#3211950) outlines a potential way to repeat this error. Although those steps would likely be very difficult to follow in any real application without forcing it to stop at some kind of debugging breakpoint, since the time between "open a DB connection" and "use the DB connection" would typically be almost instantaneous. – Josh Darnell Jul 28 '15 at 15:44
2

I have seen this happening due to bad coding too..

If you open a recordset using a connection and if you reuse that same connection in a loop for another recordset while the first connection is not closed then that can cause similar errors.

Another occasion very rarely on web applications is while the application pool is recycling you may receive similar error.

We have different sites in a same server where I have noticed that with the same application but with different customisations, only one site is causing this issue. That leads to the above findings.

This blog helped me to find the issues:

http://offbeatmammal.hubpages.com/hub/Optimising_SQL_Server

Sunil
  • 95
  • 8
  • That sounds like a pretty "random" sort of potential result from bad/abysmal object-sharing between threads – Warren P Mar 12 '14 at 10:54
1

The code here detects a disconnect event firing and reconnects using a timer. It is assumed that you realize when reading this code that you must drop a TTimer onto this data module being shown here, and create an OnTimer event with the code shown below.

Please check the next code:

unit uDM;

interface

uses
  SysUtils, Classes, DB, ADODB, Vcl.ExtCtrls;

type
  TDM = class(TDataModule)
    ADOConnection: TADOConnection;
    ConnectionTimmer: TTimer;
    procedure ADOConnectionDisconnect(Connection: TADOConnection;
      var EventStatus: TEventStatus);
    procedure ConnectionTimmerTimer(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  DM: TDM;

implementation

{$R *.dfm}

procedure TDM.ADOConnectionDisconnect(Connection: TADOConnection;
  var EventStatus: TEventStatus);
begin
  if eventStatus in [esErrorsOccured, esUnwantedEvent] then
    ConnectionTimmer.Enabled := True;
end;

procedure TDM.ConnectionTimmerTimer(Sender: TObject);
begin
  ConnectionTimmer.Enabled := False;
  try
    ADOConnection.Connected := False;
    ADOConnection.Connected := True;
  except
    ConnectionTimmer.Enabled := True;
  end;
end;

end.
Warren P
  • 65,725
  • 40
  • 181
  • 316
  • This answer could be improved by adding some helpful context as to how it specifically relates to the question. More information here https://stackoverflow.com/help/how-to-answer specifically under the subheading "Answer the question" – Nigel Savage Jan 26 '20 at 18:32
  • I have tried this exact technique and it creates as many issues as it solves. I think you should leave this hack here and we should see if you continue to believe in doing this kind of "disconnect and reconnect" hack in production. I tried this approach and discarded it as too problematic. Your mileage may vary. I do thank you for trying to help others. All answers here are coming from a helpful urge and even if I think this code is a bad idea in principle, I think your intentions here are great. Thank you. – Warren P Jan 30 '20 at 17:30
  • People who copy and paste this code will find it doesn't work as the DFM connections would not be made for them unless they realize the intent here. You need to drop a timer onto the form and create a timer event in the IDE and then use the event handler code as coded in the answer. Still a bad idea. – Warren P Jan 30 '20 at 17:32