0

I have 2 data centers each with a master/slave database running Percona server 5.5.23. For some reason, one of the slave databases crashed with this error:

03:00:40 UTC - mysqld got signal 11 ;  
This could be because you hit a bug. It is also possible that this binary  
or one of the libraries it was linked against is corrupt, improperly built,  
or misconfigured. This error can also be caused by malfunctioning hardware.  
We will try our best to scrape up some info that will hopefully help  
diagnose the problem, but since we have already crashed,   
something is definitely wrong and this may fail.  
key_buffer_size=16777216  
read_buffer_size=131072  
max_used_connections=1  
max_threads=151  
thread_count=0  
connection_count=0  
It is possible that mysqld could use up to   
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 346809 K  bytes of         memory  
Hope that's ok; if not, decrease some variables in the equation.  
Thread pointer: 0x0  
Attempting backtrace. You can use the following information to find out  
where mysqld died. If you see no messages after this, something went  
terribly wrong...  
stack_bottom = 0 thread_stack 0x80000  
/usr/local/Percona-Server-5.5.23-rel25.3-   240.Linux.x86_64/bin/mysqld(my_print_stacktrace+0x35)[0x7d4c85]  
/usr/local/Percona-Server-5.5.23-rel25.3-240.Linux.x86_64/bin/mysqld(handle_fatal_signal+0x3e1)[0x690cb1]  
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fd2b94c7cb0]  
/usr/local/Percona-Server-5.5.23-rel25.3-240.Linux.x86_64/bin/mysqld[0x8f2df2]  
/usr/local/Percona-Server-5.5.23-rel25.3-240.Linux.x86_64/bin/mysqld[0x81d607]  
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fd2b94bfe9a]  
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fd2b86a93fd]  
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.  
140802 03:00:41 mysqld_safe Number of processes running now: 0  
140802 03:00:41 mysqld_safe mysqld restarted  
/usr/local/Percona-Server-5.5.23-rel25.3-240.Linux.x86_64/bin/mysqld: error while loading shared libraries: unexpected PLT reloc type 0x00  
140802 03:00:41 mysqld_safe mysqld from pid file /dbdata1/mysqld.pid ended  

Every attempt to restart that slave results in the last 3 lines of the previous error log. It doesn't say what shared library it's having trouble loading. I did some searching on "unexpected PLT reloc type", but nothing concrete to explain what that is. I did see something about how a corrupted binary could have caused it so I ran checksums on mysqld on all 4 of my database servers. Turns out all my working databases show the same checksum:

sha256sum mysqld
0b42e4625a87de52e5f51f2eb74fb7f2db63116e2b78f51d2897c1938a0e03d1  mysqld

where as my broken database shows:

sha256sum mysqld
7bfd58d1c1948a36cf4602c697dadd60e422d61ff75eeb4a0344f8ec395b03ea  mysqld

So the binary seems to corrupted, though strangely, all the binaries have the same modification date AND the same number of bytes. I'm not sure what could have happened to the binary of a running server that could have corrupted it.

I could attempt to re-install percona to get working binaries, but I'd like to know what happened here so that I can prevent it from happening again in the future.

HBruijn
  • 77,029
  • 24
  • 135
  • 201
Beekums
  • 1
  • 1

1 Answers1

0

Figured it out thanks to this blog post: https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1

So the binary also had the same number of bytes and modification time (2012) as the other working servers so it was strange that the binary on disk wasn't changed, but the checksum was still different. Seems like the binary was entirely cached in RAM and that's where the corruption was. Clearing the cache worked!

Beekums
  • 1
  • 1