0

I am using an Azure Hybrid Connection to connect from a Dotnet 5 API running in Docker on Azure Linux to an on-prem Oracle 12 server. All our database queries run stored procedures that return cursors. The server is running out of connections once a week or so.

I have limited (i.e. no) visibilty into what the Oracle server sees, and no admin rights on the server. I can ask an admin to run diagnostic queries for me. There are several instances of the application connecting to the same Oracle instance at the moment.

As far as I can tell, the app closes and disposes of connections in an orderly fashing. I am not overriding any default connection pool settings in the connection string. I use the latest Oracle.ManagedDataAccess.Core package from Nuget (3.21.1).

I am wondering if the Hybrid Connection (it uses Service Bus under the hood!) does not properly communicate to the server when the pool clears out unused connections, and the server is holding on to these indefinitely. If you are using Oracle over an Azure Hyrbid connection at scale, I'd be curious to know if it is stable over longer periods, and if connection pooling works correctly.

Update: still no resolution, I found a prior report of this happening, and my hunch about Oracle not properly working together with HCM seems to be solidifying. Look at this Netstat output from the app server slot, for example:

~# netstat -p  | grep 1521
tcp        0      0 127.0.0.14:1521         localhost:49192         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:46480         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:42372         CLOSE_WAIT  -
tcp        0      0 localhost:42452         127.0.0.14:1521         FIN_WAIT2   -
tcp        0      0 127.0.0.14:1521         localhost:33472         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60632         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:52922         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:42406         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60254         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:52904         CLOSE_WAIT  -
tcp        0      0 localhost:42470         127.0.0.14:1521         ESTABLISHED 19/dotnet
tcp        0      0 127.0.0.14:1521         localhost:33422         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:49072         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60244         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:52912         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:49100         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:42356         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60822         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:42452         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60728         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60294         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:49132         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:52884         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:33560         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60722         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:52952         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:33492         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:33538         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60704         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60144         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:49146         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:49026         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:42470         ESTABLISHED -
tcp        0      0 127.0.0.14:1521         localhost:42434         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:52932         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60688         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60226         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:60312         CLOSE_WAIT  -
tcp        0      0 127.0.0.14:1521         localhost:33360         CLOSE_WAIT  -

How should I interpret the fact that the Oracle port appears in both the local and the remote column, with the same loopback IP?

Update, and answering this last question:

The Hybrid Connection Manager exposes the Oracle endpoint as a local loopback address on the client. I understand that CLOSE_WAIT means the other side (HCM) has closed the connection and the local application (Oracle pool) has not closed the connection yet but is holding on to it.

Now I wonder if this is an issue between HCM and ODP.Net. If something closes an active connection from underneath the pool, maybe neither the Oracle client nor the server deal with it correctly?

cdonner
  • 37,019
  • 22
  • 105
  • 153
  • 1
    I don't know how many sessions you are talking about on the Oracle database but you could get your admin to dump all the rows from gv$session and it might tell you something. LOGON_TIME tells you when the session was started. MACHINE tells you what host the connection is coming from. USERNAME tells you the database user. If it is tens of thousands of sessions you would have to cut it down maybe by just requesting 1000 rows to look at. – Bobby Durrett Jun 28 '21 at 23:22
  • @BobbyDurrett thanks, I will start with that. – cdonner Jun 28 '21 at 23:33
  • But I think, from earlier conversations with the admin, that the issue is processes, not connections. I will find out. – cdonner Jun 28 '21 at 23:38
  • If you are using the default dedicated server type of connections then there will be one process per session on the database. This assumes the target database server is Unix/Linux. So, for every row in gv$session there will be a process. The parameters processes and sessions are closely related with typical dedicated server connections on Unix/Linux. Does your Azure Hybrid Connection tool have min and max database connection settings? – Bobby Durrett Jun 28 '21 at 23:48
  • @BobbyDurrett this is a standard ODPNT managed driver connection. The Hybrid Connection Manager is supposed to tunnel TCP/IP traffic from Azure to the on-prem network completely transparently to the server and the client. It has been a struggle from the start, with Microsoft fixing bugs along the way. See https://devblogs.microsoft.com/premier-developer/using-azure-app-services-with-hybrid-connections/. The Oracle server is running Windows, afaik. – cdonner Jun 29 '21 at 00:02
  • 1
    @BobbyDurrett we see a lot more processes than sessions, like 10 times more. My working theory is that the app does not close ref cursors properly before closing the connection, and that the cursors are blocking the releasse of the processes server-side. Working on substantiating that. – cdonner Jun 29 '21 at 19:10
  • I wonder if dead connection detection would help. I have not needed it with an Oracle database on Windows but maybe in this case with the tunnel something like that would help. https://www.oracle.com/technetwork/database/enterprise-edition/oraclenetdcd-2179641.pdf I think it checks to see if the connection is active by occasionally sending a packet. – Bobby Durrett Jun 29 '21 at 19:39
  • 1
    I have refactored the client code in a way that ensures we close cursors and release all other database objects instances before a connection is closed. There is no change. – cdonner Jul 19 '21 at 17:25
  • 1
    It seems like you have a bunch of logins that fail with network handshaking before they complete or a bunch of logouts that successfully logout but hang up in the final network communication. Some sort of timeout like the dead connection detection could clean this sort of mess up. Would be nice to hear it from someone with the same configuration as you but DCD exists to cleanup failed logouts that hang around. – Bobby Durrett Jul 26 '21 at 21:21

1 Answers1

0

After much back and forth with Oracle's and Microsoft's support organizations (and the inevitable finger-pointing), it was determined that Azure's HCM does not support this scenario and cannot be used. Our client was forced to create a VPN that the cloud application uses to connect to the on-prem database server.

cdonner
  • 37,019
  • 22
  • 105
  • 153