3

I have a very strange error after migration of my C# .NET4 application from old 32bit Windows 2003 Server to new 64bit Windows 2008 Server. I've spend whole week trying to find out what is going on and I have run out of ideas, so please, advice.

Details:
I have a small C# .NET 4.0 application, that extracts some data from remote Oracle database server. The application used to work well on old, 32bit Windows 2003 Server. Last times I had to move it on new server with 64 bit Windows 2008 R2 Server. Since then, one query to Oracle database, that returns about 500k rows is not working correctly. About 2 minutes after start, it throws exception similar to this:

ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 288 Serial number: 43544

I've downloaded the newest ODAC driver from Oracle site for 64bit system: 64-bit ODAC 11.2 Release 5 (11.2.0.3.20). I've tried both: installer and XCOPY versions. Installation and configuration via .bat file for ODP.NET4. TNS are set correctly.

Remote Oracle database is located in the other country. I'm not administrator of it. It's Oracle9i EE, 64 bit, v9.2.0.8.0.

My C# application for testing looks like that:

using (OracleConnection conn = new OracleConnection("User Id=userId; Password=pass; Data Source=database;"))
{
    conn.Open();
    string sql = "SELECT * FROM table";
    int i = 0;
    using (OracleCommand cmd = new OracleCommand(sql, conn))
    {
        using (OracleDataReader reader = cmd.ExecuteReader())
        {
            while (reader.Read())
            {
                if (i++ % 100 == 0)
                {
                    Console.WriteLine(i);
                }
            }
        }

    }
}

I used Adapter.Fill(dataTable), but for testing datareader is better - it shows me more or less when something is happening.

It's using Oracle.DataAccess.dll from ODP.NET directory (either: 64 or 32 bit). Application is compiled for ANY CPU, but I've also tried x64 and x86 compilations (with a a proper Oracle.DataAccess.dll library).

I was thinking it could be some timeout, so I've tried different connection strings options, including:
Validate Connection=true
Connection Lifetime=600;Connection Timeout=600;
Pooling=false
Enlist=false;
I've even set command CommandTimeout to 600.

None of it helped - application downloads different number of rows, depending on server load - from 200k to 300k and throws ORA-03113 exception. It takes about 120s (but it's not exactly, sometimes it's 130, sometimes more, but most time about 120s).

The strange is, that sometimes the query finishes OK, without any error, downloading all 500k rows. For example today about 30-40 tries failed, while 2 succeeded.
Also the strange is, that the same application with the same version of ODAC driver (for 32 bit system) works well each time on old, 32bit Windows 2003 Server.

The new 64 bit server is with Windows 2008 Server R2 Standard + SP1, all actual updates, no antivirus installed.

I've tried different configurations:
- to install 64 bit Oracle driver and compile application to Any CPU and x64 with 64 bit Oracle.DataAccess library,
- to install 32 bit Oracle driver and compile application to Any CPU and x86 with 32 bit Oracle.DataAccess library,

None of them work. I've noticed, that sometimes I cannot remove/rename Oracle driver directory, because some library is blocked by MSDTC (Distributed Transaction Coordinator) service. I've tried to disable it, change configuration to the same as on old, increase some parameters in server administration tools (among others: Transacion Time Limit, from 60s to 0 (without limit)). But I haven't noticed any improvement.

I've tried also with Firewall disabled.

When connection breaks (?), I don't see any matching alert or error in Windows event log.

As I'm not experienced with server administration, it's configured mostly with default values. It has 3 roles enabled: Application Server, Web Server (IIS) and File Services.

I've talked with Oracle Database Server administrator and he assured me, that there are no limitations (timeouts, resources) settings and that the Oracle server log does not show any errors.

And once more - the same query is working on 32 bit Windows 2003 Server, but not on 64 bit Windows 2008 Server. I've also tested, that it's not working even on server when I removed Application Server and Web Server roles (so it's almost clear).

I've also tested program + 64 bit driver on my home 64bit Windows7 laptop and it works OK.

I SUPPOSE that there might be SOME setting(s) on Windows 2008 Server - either for timeout or resources, that I may hit and it kill the connection/transaction, but I'm not aware how can I check that or where to look. So please, can you point me what can be wrong?

--Edit:
Another thing I've noticed: when query finish with success (last time I managed to do that yesterday in the night - when server load and connection load are low), looking at Oracle session monitor there's only one connection open and closed shortly after application is closed. However, each time application fails, it leaves 5 open connections (with null command), which don't disappear for many minutes after application close. So I think it might be something wrong with handling of connection pool...

mj82
  • 5,193
  • 7
  • 31
  • 39
  • post the full stack trace of the exception. it may provide some more clues. have you tried compiling your app for x86 and running on your 2008 x64 machine to see what happens? – wal Jan 12 '13 at 00:13
  • 1
    It sounds like this might be a networking problem. Is the new server connected via any new switching or routing hardware that the old server wasn't? – Chris Shain Jan 12 '13 at 17:01
  • There are no more useful info in the exception - besides the exception I've quoted, stack just points to my procedure -> `Oracle.DataAccess.Client.OracleDataReader.Read()` method as the source of exception, and up to OracleException handling methods. And yes, I've tried with x86 compilation (it required x86 ODAC as well), but the same result – mj82 Jan 12 '13 at 17:04
  • @Chris Shain - the new server is in the same place that the old one was, it's using the same infrastructure. But I have to talk to network administrator about that, another thing to check on Monday. – mj82 Jan 12 '13 at 17:08
  • 3
    Quite likely an error has occurred on the server side. As a consequence the connection was closed and the client threw an ORA-01331 error. Please have a look at the Oracle server logs. They should contain more information. It could be an Oracle bug. – Codo Jan 12 '13 at 19:16
  • ***`ORA-01331`*** Maybe **connection pooling** (`connection string`) issues ***http://stackoverflow.com/questions/15980979/odp-net-connection-pooling-parameters*** – Kiquenet Nov 27 '15 at 10:50

1 Answers1

1

Finally, after many tests, with help of network administrator, I've:
1. changed server proxy to other,
2. disabled TCP/IPv6 protocol,
3. disabled firewall.
Still not sure which one was "guilty", yet it looks like it has helped - I suppose it could be some firewall rule for TCP/IPv6, either on mine or on proxy server.

mj82
  • 5,193
  • 7
  • 31
  • 39