0

I have multithreading application client to MySQL and I use MySQL C-client (libmysqlclient_r). I have db connections pool, where I open connection before create thread workers (pthread_create).

The each worker gets only single connection from connections pool before starting the work and puts it to the pool after finishing work. The each worker use it's unique connection.

But, database server is very overload, and MySQL client have errors: MySQL " Lost connection to MySQL server during query" or " MySQL server has gone away". My application make reconnect in the worker thread:

my_bool res = mysql_ping(c->mysql);
if (res) {
    mysql_close(c->mysql);
    mysql_thread_end();

    c->mysql = mysql_init(NULL);
    mysql_thread_init();                

    struct conn_desc *cd = &c->db->cds[c->num];
    syslog(LOG_ERR, "reconnect :[%s:%d]\t%s\tnew MySQL=%X tid=%X\n", cd->host,  cd->port, c->db->default_db_name, c->mysql, pthread_self());

    res = mysql_real_connect(c->mysql, cd->host, cd->login, cd->passwd, c->db->default_db_name, cd->port, NULL, 0);
    if (res == NULL) {
        syslog(LOG_ERR, "[restart ] reconnect Error\n");
        exit(1);
    }
}

Sometime, I have segmentation fault into mysql_ping() or mysql_real_connect(). Why? I use the separate mysql-connections between workers threads. What is wrong? How is making the Right?

 0  0x0000000000000000 in ?? ()
 1  0x00007ffff7a7fc29 in my_net_local_init () from /usr/lib64/mysql/libmysqlclient_r.so.16
 2  0x00007ffff7ab0144 in my_net_init () from /usr/lib64/mysql/libmysqlclient_r.so.16
 3  0x00007ffff7aab245 in **mysql_real_connect ()** from /usr/lib64/mysql/libmysqlclient_r.so.16
 4  0x000000000040e72c in mysql_query_run (c=0xc36760,
q=0x7fffca1fb670 "SELECT     `id`, `name` FROM `msg_dir` WHERE `owncrc` = 2831014197") at mysql.c:163
 5  0x000000000040fdf2 in mysql_load_user (uid=2831014197, online=0) at mysql.c:706
 6  0x0000000000406047 in get_mess_count (uid=2831014197, mid=0, online=0) at commands.c:158
 7  0x000000000040618c in cmd_get_all_mess_count (key=0x7fffa80bb074 "gamc|2831014197|0|0 ", data=0x0, data_len=0, ret=0x7fffca1fbbc0, ret_len=0x7fffca1fbbdc) at commands.c:194
 8  0x0000000000405f52 in execute_command (key=0x7fffa80bb074 "gamc|2831014197|0|0 ", data=0x0, data_len=0,
ret=0x7fffca1fbbc0, ret_len=0x7fffca1fbbdc) at commands.c:132
 9  0x000000000040c4be in memcache_get (loop=0x7fffa40008c0, mctx=0x7fffa80bb040) at mc.c:479
 10 0x000000000040d353 in memcached_client (loop=0x7fffa40008c0, io=0x7fffa80bb040, revents=1) at mc.c:785
 11 0x00007ffff61e5071 in ev_invoke_pending () from /usr/lib64/libev.so.4
 12 0x00007ffff61ea23a in ev_run () from /usr/lib64/libev.so.4
 13 0x000000000040b5ec in ev_loop (loop=0x7fffa40008c0, flags=0) at /usr/include/libev/ev.h:810
 14 0x000000000040e24c in worker_listen (arg=0x10) at mc.c:1126
 15 0x00007ffff762c851 in start_thread () from /lib64/libpthread.so.0
 16 0x00007ffff5d2f6dd in clone () from /lib64/libc.so.6

and nex bt:

0  0x00000000009f3f70 in ?? ()
1  0x00007ffff7aaf32a in net_real_write () from /usr/lib64/mysql/libmysqlclient_r.so.16
2  0x00007ffff7aaf63b in net_flush () from /usr/lib64/mysql/libmysqlclient_r.so.16
3  0x00007ffff7aaf901 in net_write_command () from /usr/lib64/mysql/libmysqlclient_r.so.16
4  0x00007ffff7aac6a9 in cli_advanced_command () from /usr/lib64/mysql/libmysqlclient_r.so.16
5  0x00007ffff7a7b1fd in **mysql_ping** () from /usr/lib64/mysql/libmysqlclient_r.so.16
6  0x000000000040e8f1 in mysql_query_run (c=0x9ed930,
q=0x7fff6fffe670 "SELECT `invisible` FROM meetre.autho2 WHERE `crc` = 1032552218") at mysql.c:164
7  0x00000000004107a0 in mysql_load_user (uid=1032552218, online=1) at mysql.c:858
8  0x0000000000406278 in get_mess_count (uid=1032552218, mid=0, online=1) at commands.c:165
9  0x00000000004063bd in cmd_get_all_mess_count (key=0x7fff90383f84 "gamc|1032552218|0|1 ", data=0x0, data_len=0,
ret=0x7fff6fffebc0, ret_len=0x7fff6fffebdc) at commands.c:201
10 0x0000000000406182 in execute_command (key=0x7fff90383f84 "gamc|1032552218|0|1 ", data=0x0, data_len=0,
ret=0x7fff6fffebc0, ret_len=0x7fff6fffebdc) at commands.c:135
11 0x000000000040c718 in memcache_get (loop=0x7fff5c0008c0, mctx=0x7fff90383f50) at mc.c:459
12 0x000000000040d5cb in memcached_client (loop=0x7fff5c0008c0, io=0x7fff90383f50, revents=1) at mc.c:765
13 0x00007ffff61e5071 in ev_invoke_pending () from /usr/lib64/libev.so.4
14 0x00007ffff61ea23a in ev_run () from /usr/lib64/libev.so.4
15 0x000000000040b81c in ev_loop (loop=0x7fff5c0008c0, flags=0) at /usr/include/libev/ev.h:810
16 0x000000000040e4f4 in worker_listen (arg=0x1e) at mc.c:1106
17 0x00007ffff762c851 in start_thread () from /lib64/libpthread.so.0
18 0x00007ffff5d2f6dd in clone () from /lib64/libc.so.6

code 2:

pthread_mutex_lock(&conn_mutex);
my_bool res = mysql_ping(c->mysql);
pthread_mutex_unlock(&conn_mutex);

if (res != OK) {
    mysql_close(c->mysql);              
    mysql_library_end();

    pthread_mutex_lock(&conn_mutex);
    mysql_library_init(0, NULL, NULL);
    pthread_mutex_unlock(&conn_mutex);

    c->mysql = mysql_init(NULL);

    struct conn_desc *cd = &c->db->cds[c->num];
    syslog(LOG_ERR, "reconnect :[%s:%d]\t%s\tnew MySQL=%X tid=%X %s\n", cd->host,  cd->port, c->db->default_db_name, c->mysql, pthread_self(), mysql_error(c->mysql));
    res = mysql_real_connect(c->mysql, cd->host, cd->login, cd->passwd, c->db->default_db_name, cd->port, NULL, 0);

    if (res == NULL) {
        syslog(LOG_ERR, "[restart ] reconnect Error\n");
        exit(1);
    }
}
user1514692
  • 60
  • 1
  • 8
  • Could you show us a strack-trace created as the result of the segmmentation violation? – alk Apr 01 '13 at 12:46
  • Yes, the strack-trace pointed to the mysql_real_connect(). I had more 10 succesfull reconnection and one sigfault :(. So, I had sigfault into mysql_ping(). – user1514692 Apr 01 '13 at 13:07
  • Then why not add the stack-trace to your posting. It would make things clearer to us. – alk Apr 01 '13 at 13:10
  • Ok,the stack-trace was added in the post – user1514692 Apr 01 '13 at 13:38
  • As I lock the mysql_ping() by mutex, I have blocking as I have many SQL queries. See code 2. – user1514692 Apr 01 '13 at 13:40
  • To me the values for `c` in the calls to `mysql_query_run` do look suspicious. I wouldn't wonder if those are garbage. Anyway, I'd recommend to install the debug builds of the mysql-libs, then the next crashes provide you stacktraces referencing the exact source line of the crashes, so you can look up what is happening there. – alk Apr 01 '13 at 13:45

1 Answers1

2

(Found this via a Google search. I know the original author of the question is probably long gone, but responding for posterity in case anyone else stumbles their way here.)

Your reconnection attempt is incredibly overcomplicated, and you're probably tripping over that to cause these crashes.

During your reconnection attempt, you:

  1. Release your mutex.
  2. Close the connection.
  3. Allow other threads to try to use the now-closed connection, which will segfault. You unlocked the mutex, after all.
  4. Call mysql_library_end() without any sort of locking, which tears down libmysqlclient process-wide. Any attempt to run mysql_ping or most mysql_ functions other than mysql_connect, no matter where or with which parameters, will now segfault, and you still don't have a lock preventing other threads from doing this.
  5. You reqacuire your lock and call mysql_library_init, and then release it again for some reason.
  6. Finally you call mysql_real_connect, telling it to operate on the shared c->mysql variable, without any locking. This can be run by multiple threads at the same time, and therefore can also segfault.

As you can see, there are several points where your code is crashworthy. Here's how to fix it:

  1. Hold your conn_mutex for the entire operation, from before mysql_ping until after the reconnect attempt is finished. Any other plan will result in your threads fighting over who gets to attempt to reconnect.
  2. Don't call mysql_library_end or mysql_library_init at all -- these calls are affecting process-wide state for no good reason, and have nothing to do with the re-establishment of individual connections. Only call those before you make your very first connection in your program, and after you close your very last one, and then only on one thread.
  3. Make sure your code is calling mysql_thread_init on all new threads, and calls mysql_thread_end before the exit of any threads other than your main one. (These calls are also independent of any particular connection, and leaving them out can cause some very subtle issues because things often still mostly work. If LOAD DATA INFILE segfaults on you, for example, this is probably your problem.)
  4. Make sure you're linking your code with the thread-safe libmysqlclient_r and not the regular libmysqlclient, or none of the above will save you.

Hope that helps anyone else with similar issues.

Walter Mundt
  • 24,753
  • 5
  • 53
  • 61