0

I am running Ubuntu Linux 14.04.2 on a Dell R610 Server. This box is used to host the MySQL service for my applications. The MySQL data directory is mounted on DELL SAN Equallogic so this box is also an iSCSI initiator.

The issue that I am facing is that every week when we apply updates and reboot the server, the MySQL service will intermittently refuse to come up.

In the /etc/init/mysql.conf, the MySQL upstart job has been configured to start on / stop on as follows:

start on runlevel [2345]
stop on starting rc RUNLEVEL=[016]

Following is an excerpt from the server's /var/log/kern.log, when MySQL comes up successfully.

[These messages are always logged]

IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
Apr 27 02:07:03 DB-Box kernel: [   14.070796] bnx2 0000:01:00.1 em2: NIC Copper Link is Up, 1000 Mbps full duplex
Apr 27 02:07:03 DB-Box kernel: [   14.070803] , receive & transmit flow control ON
Apr 27 02:07:03 DB-Box kernel: [   14.070885] IPv6: ADDRCONF(NETDEV_CHANGE): em2: link becomes ready

[These are logged when MySQL successfully comes up]

Apr 27 02:07:03 DB-Box kernel: [   14.388522] scsi3 : iSCSI Initiator over TCP/IP
Apr 27 02:07:03 DB-Box kernel: [   14.406089] scsi4 : iSCSI Initiator over TCP/IP
Apr 27 02:07:03 DB-Box kernel: [   14.410710] scsi5 : iSCSI Initiator over TCP/IP
Apr 27 02:07:03 DB-Box kernel: [   14.415547] scsi6 : iSCSI Initiator over TCP/IP
Apr 27 02:07:04 DB-Box kernel: [   14.434132]  connection1:0: detected conn error (1020)
Apr 27 02:07:04 DB-Box kernel: [   14.445123]  connection2:0: detected conn error (1020)
Apr 27 02:07:04 DB-Box kernel: [   14.446003]  connection3:0: detected conn error (1020)
Apr 27 02:07:04 DB-Box kernel: [   14.447461]  connection4:0: detected conn error (1020)
Apr 27 02:07:04 DB-Box kernel: [   15.237897] scsi 3:0:0:0: Direct-Access     EQLOGIC  100E-00          6.0  PQ: 0 ANSI: 5
Apr 27 02:07:04 DB-Box kernel: [   15.238173] scsi 4:0:0:0: Direct-Access     EQLOGIC  100E-00          6.0  PQ: 0 ANSI: 5
Apr 27 02:07:04 DB-Box kernel: [   15.238196] sd 3:0:0:0: Attached scsi generic sg3 type 0
Apr 27 02:07:04 DB-Box kernel: [   15.238432] sd 4:0:0:0: Attached scsi generic sg4 type 0
Apr 27 02:07:04 DB-Box kernel: [   15.238828] scsi 5:0:0:0: Direct-Access     EQLOGIC  100E-00          6.0  PQ: 0 ANSI: 5
Apr 27 02:07:04 DB-Box kernel: [   15.239056] sd 3:0:0:0: [sdb] 1048596480 512-byte logical blocks: (536 GB/500 GiB)
Apr 27 02:07:04 DB-Box kernel: [   15.239075] sd 4:0:0:0: [sdc] 419450880 512-byte logical blocks: (214 GB/200 GiB)
Apr 27 02:07:04 DB-Box kernel: [   15.239101] sd 5:0:0:0: Attached scsi generic sg5 type 0
Apr 27 02:07:04 DB-Box kernel: [   15.239496] sd 5:0:0:0: [sdd] 1048596480 512-byte logical blocks: (536 GB/500 GiB)
Apr 27 02:07:04 DB-Box kernel: [   15.239836] scsi 6:0:0:0: Direct-Access     EQLOGIC  100E-00

I have observed that the starting lines marked in bold are always logged. On occasions when MySQL fails to come up successfully, the iSCSI logs are not generated.

I am stuck as to where to start my investigation form. I am thoroughly confused if this has to do something with boot order or am I missing something!

Edit 1:

Adding more logging as pointed out by @JimNim

Apr 27 01:54:23 DB-Box kernel: [   14.204031] Loading iSCSI transport class v2.0-870.
Apr 27 01:54:23 DB-Box kernel: [   14.227691] iscsi: registered transport (tcp)
Apr 27 01:54:23 DB-Box kernel: [   14.334826] iscsi: registered transport (iser)
Apr 27 01:54:25 DB-Box kernel: [   15.575642] bnx2 0000:01:00.0 em1: NIC Copper Link is Up, 100 Mbps full duplex
Apr 27 01:54:25 DB-Box kernel: [   15.575651]
Apr 27 01:54:25 DB-Box kernel: [   15.575733] IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
Apr 27 01:54:26 DB-Box kernel: [   16.538071] bnx2 0000:01:00.1 em2: NIC Copper Link is Up, 1000 Mbps full duplex
Apr 27 01:54:26 DB-Box kernel: [   16.538079] , receive & transmit flow control ON
Apr 27 01:54:26 DB-Box kernel: [   16.538161] IPv6: ADDRCONF(NETDEV_CHANGE): em2: link becomes ready
Cik
  • 101
  • 3
  • Does dmesg show any potential problems (warnings, errors, etc) before the "normal" iSCSI messages you're expecting? Do you see iSCSI even being loaded at all? ("Loading iSCSI transport class vx.x") – JimNim Apr 27 '15 at 15:02
  • No @JimNim no such potential problems since I have compared the kern.log diff of both scenarios. What I have stated in the question seems to be the only visible difference. Also in the past I have observed that the iSCSI login is successful approx. 1 min after the login prompt is displayed. Also added an edit to show that transport class logs are getting logged (for both scenarios). – Cik Apr 27 '15 at 16:36
  • Are you using the Equallogic host integration tools? (They aren't "supported" on ubuntu, but I've never seen whether or not they will still install/run). You might want to consider running this by EQL support - at the very least, that could help you rule out any possible network issues, and check the EQL logs for any clues as to what the problem may be. – JimNim Apr 27 '15 at 16:51
  • Can you please show the fstab entry used to mount the filesystem? And is it mounted after reboot? – Fox Apr 27 '15 at 18:55

1 Answers1

0

It can be that sometime the iSCSI connection use some more time to go up, and the MySQL service "race ahead" of the iSCSI mounting.

After boot, try to restart MySQL: if during boot it did not start properly, but from the shell it starts without problem, than you confirm that we have a timing issue.

Another possibility: did you reach the iSCSI mount using an hostname or an IP address? In the first case, maybe you have a DNS problem that prevent the iSCSI initiator to resolve the target's IP address.

shodanshok
  • 47,711
  • 7
  • 111
  • 180