3

Situation is that, we have AWS EC2 medium instance, with Linux on it.
It has Drupal as well. Additionally to that we have few files that access to mysql as well, with same settings as Drupal does.
The problem is that - at one point mysql refuses connections.
It happens when there is low or big load (not related on this), as well once it gets inaccessible, mysqld process is still running, and does not falls.
Restarting of this process does not fixes a problem. Rebooting instance - fixes problem.

When i connect to localhost it gives this:

Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

While mysql.sock file is on place and with right permissions.
Restarting mysqld doesn't helps, but restarting instance - solves the problem.

my.cnf looks like that:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0

wait_timeout=28800

interactive_timeout = 28800

max_allowed_packet=32M

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

As well mysqld runs without any errors, and in logs we have this:

120830  9:48:00 [Note] /usr/libexec/mysqld: Shutdown complete

120830 09:48:00 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
120830 09:48:01 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
120830  9:48:01 [Note] Plugin 'FEDERATED' is disabled.
120830  9:48:01 InnoDB: The InnoDB memory heap is disabled
120830  9:48:01 InnoDB: Mutexes and rw_locks use GCC atomic builtins
120830  9:48:01 InnoDB: Compressed tables use zlib 1.2.3
120830  9:48:01 InnoDB: Using Linux native AIO
120830  9:48:01 InnoDB: Initializing buffer pool, size = 128.0M
120830  9:48:01 InnoDB: Completed initialization of buffer pool
120830  9:48:02 InnoDB: highest supported file format is Barracuda.
120830  9:48:02  InnoDB: Waiting for the background threads to start
120830  9:48:03 InnoDB: 1.1.8 started; log sequence number 4191070086
120830  9:48:03 [Note] Event Scheduler: Loaded 0 events
120830  9:48:03 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.20'  socket: '/var/lib/mysql/mysql.sock -u root'  port: 3306  MySQL Community Server (GPL)

When issue occured again, I took logs again, tried to stop httpd then mysqld, then run mysqld and then run httpd, and logs are absolutely the same as in normal conditions logs with same restart sequence.

Changing in php.ini did not saved the situation:

mysql.allow_persistent = Off

Restarting in this order, does not helps (even different orders tried):

service httpd stop
service mysqld stop
service mysqld start
service httpd start

We would like to find out what is problem and how prevent it from falling like that.

thomaux
  • 19,133
  • 10
  • 76
  • 103
moka
  • 22,846
  • 4
  • 51
  • 67
  • What kind of instance are you running? micro, small, medium, etc. – Ja͢ck Sep 03 '12 at 08:58
  • It is medium instance in EU West Region (Ireland). Drops has no relation to network / io loads, because they happen even if there is just few users or lots. – moka Sep 03 '12 at 11:28
  • Hmm do you run MySQL on EBS perhaps? It should be safe but I've also heard complaints about it. Since you have medium instance the problem is not likely memory related but when it drops the socket is the process itself still running? How about connections via tcp – Ja͢ck Sep 03 '12 at 13:30
  • Yes, mysql process is still running after it got inaccessible. As well, restarting process - does not fixes a problem. Reboot of instance - does fix problem. – moka Sep 03 '12 at 14:16
  • 1
    Have you tried the following restart sequence: stop apache, stop mysqld, start mysqld, start apache – Ja͢ck Sep 03 '12 at 15:06
  • Will try this sequence in next drop. But did tried some mixed restarts of both. – moka Sep 03 '12 at 17:22
  • 1
    can you to a strace for the mysql pid and post the data? – Paul Ma Sep 03 '12 at 19:18
  • pgrep shows two processes, one of them with this command: strace -p 24555 -s 80 -o /tmp/out24555.txt Gives this log in quotes: "wait4(-1, " And another process same strace params: "restart_syscall(<... resuming interrupted call ...> " I'm pretty noob in those debugging things, so don't know if I am doing it right. Thanks for looking into. – moka Sep 04 '12 at 17:29
  • Updated on top, btw strace logs did not changed based on service is ok or under an issue. Restarting in different orders - does not help. – moka Sep 06 '12 at 12:32

3 Answers3

1

From just the symptoms mentioned, the following could be going on. I hope it helps.

It's possible that your PHP uses persistent database connections which perhaps don't get closed properly. Once a certain limit is reached, the database will not accept new connections anymore (from unix socket OR network).

In the php.ini there are settings related to database persistent connections, such as:

mysql.allow_persistent = Off

The fact that mysqld restarts don't work could be related to two things:

  1. Restarting might not be the same as an explicit service mysqld stop followed by service mysqld start; also, you could check the logs while it restarts to see if it encounters anything unusual.

  2. The restart sequence can be altered slightly to also involve your PHP setup, so you would stop apache first, then stop mysqld; afterwards, you start them in reverse order.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • Issue occurred again, I went to php.ini file, changed settings as you suggested. Then restarted things in this order: httpd stop, mysqld stop, mysqld start, httpd start; For stop / start I used this line: "service httpd stop" and for start "start" of course. Unfortunately it did not help. Issue still were there, so I had to reboot instance again. – moka Sep 06 '12 at 12:28
  • 1
    This is really bizarre. You are running the database from EBS btw? Also, mysqld installed with yum right? – Ja͢ck Sep 06 '12 at 13:12
  • Thanks for helping out, how do I check where mysqld is installed on? Basically it been initialized few months ago, so as I know it is EBS, in AWS it says: root device - ebs. AS well how to check if mysqld were installed with yum? yum command does exist. – moka Sep 06 '12 at 13:47
  • If your instance has yum command and if you want to know that mysqld is installed using yum or not you can see /var/log/yum.log , if your system is of debian class OS , you can find it at /var/adm/apt/history.log – Aravind.HU Sep 07 '12 at 16:03
1

Just glancing over your configuration: Your timeouts are insanely high. As others have guessed, I think you're trying to use persistent connections. But these usually never work with standard ext/mysql(i) or ext/PDO`.

If you don't want to toy with something like the the new mysqlnd multiplexing plugin (see intro, see FAQ, I would suggest to lower wait_timeout significantly and also watch max_connections (on the mysqld-side) when you experience traffic spikes.

So wait_timeout essentially frees connection handles when your application doesn't take care of them properly. There's no point that in a web application a connection should be idle for more than 10 seconds. And you don't want lots and lots of orphan connections to stand-by.

Secondly, the max_connections variable is important too because it's not enough to just crank it up to something like 5,000 – because while that means MySQL will allow that many connections, it will also allocate resources (RAM) to handle these potential connections – even if you never need them.

During the spikes, you should be able to connect to MySQL with your root account. That's a safety measure to be able to debug the server. My suggestion would be to also enable the slow-log temporarily.

In addition, during spikes check the processlist: mysqladmin -u root -pPASS PROCESSLIST. In case anything is cut off, connect with root (mysql -u root -pPASS) and issue SHOW FULL PROCESSLIST;.

From the process list, investigate queries which show up a couple times with EXPLAIN to get to the bottom of it. If they don't use an index, that's one of your problems right there.

Another option could be moving to something like Percona server. They have a lot of additions in there – the tip of the iceberg: xtradb (100% compatible to innodb) and a slow query log which gives you more granular output (milliseconds). It's free too, of course. A good read on all the things MySQL is their blog – the mysql performance blog.

LBNL – I'm just guessing, but it could be just lack of resources. c1.medium is a nice entry-level instance (t1.micro or m1.small serve no real purpose IM*H*O), but it could be not enough. It all depends on size of the database and actual traffic.

Feel free to comment and I can try to extend my answer.

Addition – I just read over the comments on the other answer.

You might want to get rid off the EBS-backed instance. I think they are a really bad idea. If you really need the persistence, you want to create a regular instance with ephemeral storage and then attach a couple (more than 1) EBS volume to it and RAID 10 across them for increase in IO/s.

Also, I have not mentioned that yet, but it also sounds like you lack monitoring on your server. Personally, we use Librato silverline, which provides us with near-realtime vitals on all of our instances. That should also help narrow down potential issues with the storage.

Till
  • 22,236
  • 4
  • 59
  • 89
1

I am not a very experienced user in this subject but when I had some socket files trouble, I configured my apps to use TCP/IP instead. You may use 127.0.0.1 instead of localhost in your software configuration, to force TCP/IP instead of socket file.

You might be interested in the Xiaofeng Teng answer to another stackoverflow question :

Besides Michael's words,

there's another link: http://dev.mysql.com/doc/refman/5.1/en/connecting.html, it says:

On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a --port or -P option is given to specify a port number.

it's not a typical tcp/ip connection.

This, of course, will not answer your question, but perhaps solve your problem.

Community
  • 1
  • 1
Alain Tiemblo
  • 36,099
  • 17
  • 121
  • 153
  • Unfortunately using TCP/IP connection is not possible once issue occurs as well. Error is common, I will post it once I will have issue occur again. – moka Sep 10 '12 at 08:52