0

I have a sql script with a procedure. I run it using sqlplus -s @script.sql logfile.log myparam1 myparam2 But after ~ 2 hour and 10 mins, my script ends with

ERROR:
ORA-03114: not connected to ORACLE 


DECLARE
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel 
Process ID: 12345 
Session ID: 33 Serial number: 54321 

I have following tcp settings

-bash-4.1$ cat /proc/sys/net/ipv4/tcp_keepalive_time (which is two hours)
7200
-bash-4.1$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
-bash-4.1$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
9

When the script was running, I checked that TCP connection was established at my end , but on the database side (database server machine) there was no such connection.

My theory is that somehow database server is dropping the connection. And when my system sends first keepalive probe after 2 hours (7200 seconds), it finds that connection is no more active and closes the connection and script returns.

I am unable to understand why would a database system drop a connection? Are there any settings at database end to increase verbosity? Or could this be related to some firewall settings? Also, in 2 hours and 10 mins, we can guess that 2 hours portion is coming from tcp_keepalive_time, what would be 10 mins portion. Any database side retries?

EDIT: DBA and I looked at the issue, I see the TCP connection as ESTABLISHED at my end and he didn't see any connection coming from my side.

sattu
  • 632
  • 1
  • 22
  • 37
  • This may happen because a database process crashed and wasn't even able to report an error message. Ask your DBA to check the alert log for any errors in the alert log that happened at the exact same time. – Jon Heller Jun 06 '17 at 00:37
  • In your script, are you running sqlplus multiple times over the two plus hours or is your script taking two plus hours to run and sqlplus has only been started once? – user1683793 Jun 06 '17 at 22:09
  • Sqlplus has started only once. The script is just deleting the rows. – sattu Jun 07 '17 at 17:24

4 Answers4

1

We had something similar where our firewall was dropping our Pro*C connections after about two hours if we did not have any activity in that period. Our solution was to do a:

select 1 from dual;

every 15 minutes on each database connection to keep things active.

If I recall correctly, the tcp keep alive times you refer to above are only used if the connection has a setsockopt call with SO_KEEPALIVE. Since the actual connection with Oracle is managed by Oracle, we have no way of knowing if it is set or not.

In looking at this, I saw on the web a few places (this and this) a reference to sqlnet.expire_time=minutes_to_check in the $ORACLE_HOME/network/admin/sqlnet.ora file. They say:

With this parameter in place, after 10 minutes of inactivity, the server send a small 10 bytes probe packet to the client. If this packet is not acknowledge, the connection will be closed...

The next time I have our DBA's attention, I will have to get them to change this value and see how that affects things. Later in the second link, they say:

If the SQLNET.EXPIRE_TIME is less than the FW connection idle time-out, then the firewall will consider this packet as activity, and the idle time-out (firewall blackout) will never happen until both the client and the server processes are alive.

I expect this is exactly what we need.

user1683793
  • 1,213
  • 13
  • 18
  • 1
    DBA has added SQLNET.EXPIRE_TIME parameter at Oracle side, but I still see the issue. Maybe some latest firewalls may not see DCD packets as a valid traffic as mentioned in the article you pointed out. here - https://zhefeng.wordpress.com/2009/06/15/oracle-connection-idle-timeout-with-firewall/ – sattu Jun 08 '17 at 21:13
1
  1. First,you should check your alert.log,to find any error during period of this time.
  2. Then,two way can make your active session disconnect .
    (1)SQLNET.EXPIRE_TIME < 120
    (2)the init parameter resource_limit is true,and your profile's(dba_users + dba_profiles ) idle_time < 120
sure ruan
  • 27
  • 6
0

end-of-file on communication channel is rarely a DB side problem. If you suspect DB side problem, then monitor the session activity via gv$session. You will be able to find out right away if the session is "hung" or long-running on a particular statement. It is not uncommon to loose a connection to the DB, especially if the network is not real stable.

Roger Cornejo
  • 1,507
  • 1
  • 8
  • 7
0

We tried setting SQLNET.EXPIRE_TIME to 10 mins. But it didn't work. We bounced database server machine, but it still didn't work. Maybe some latest firewalls may not see DCD packets as a valid traffic as mentioned in the article mentioned in the article (also given by @user1683793 above). Finally, we changed keepalive time to 25 mins (on the client machine) so that there will be some traffic on the tcp connection. Fortunately firewall seems to consider keep alive packet as a traffic.

-bash-4.1$ cat /proc/sys/net/ipv4/tcp_keepalive_time
1500
sattu
  • 632
  • 1
  • 22
  • 37