3

I have a MySQL Server operating just fine if I access him from his local LAN (192.168.100.0/24). If I try to access hin from another LAN (192.168.113.0/24 in this case) it hangs for a really long time before delivering the result. SHOW PROCESSLIST; shows this process in Sleep, State empty.

If I strace -p this process I get the following Output (23512 is the PID of the corresponding mysqld process):

Process 23512 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 1
fcntl(10, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
accept(10, {sa_family=AF_INET, sin_port=htons(51696), sin_addr=inet_addr("192.168.113.4")}, [16]) = 33
fcntl(10, F_SETFL, O_RDWR)              = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [CHLD], SA_RESTORER|SA_RESTART, 0x7f9ce7ca34f0}, {SIG_DFL, [CHLD], SA_RESTORER|SA_RESTART, 0x7f9ce7ca34f0}, ) = 0
getpeername(33, {sa_family=AF_INET, sin_port=htons(51696), sin_addr=inet_addr("192.168.113.4")}, [16]) = 0
getsockname(33, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.100.190")}, [16]) = 0
open("/etc/hosts.allow", O_RDONLY)      = 64
fstat(64, {st_mode=S_IFREG|0644, st_size=580, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9ce9839000
read(64, "# /etc/hosts.allow: list of host"..., 4096) = 580
read(64, "", 4096)                      = 0
close(64)                               = 0
munmap(0x7f9ce9839000, 4096)            = 0
open("/etc/hosts.deny", O_RDONLY)       = 64
fstat(64, {st_mode=S_IFREG|0644, st_size=880, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9ce9839000
read(64, "# /etc/hosts.deny: list of hosts"..., 4096) = 880
read(64, "", 4096)                      = 0
close(64)                               = 0
munmap(0x7f9ce9839000, 4096)            = 0
getsockname(33, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.100.190")}, [16]) = 0
fcntl(33, F_SETFL, O_RDONLY)            = 0
fcntl(33, F_GETFL)                      = 0x2 (flags O_RDWR)
setsockopt(33, SOL_SOCKET, SO_RCVTIMEO, "\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
setsockopt(33, SOL_SOCKET, SO_SNDTIMEO, "<\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
fcntl(33, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
setsockopt(33, SOL_IP, IP_TOS, [8], 4)  = 0
setsockopt(33, SOL_TCP, TCP_NODELAY, [1], 4) = 0
futex(0x7f9cea5c9564, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f9cea5c9560, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f9cea5c6fe0, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=10, events=POLLIN}, {fd=12, events=POLLIN}], 2, -1) = 1 ([{fd=10, revents=POLLIN}])
fcntl(10, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
accept(10, {sa_family=AF_INET, sin_port=htons(51697), sin_addr=inet_addr("192.168.113.4")}, [16]) = 31
fcntl(10, F_SETFL, O_RDWR)              = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [CHLD], SA_RESTORER|SA_RESTART, 0x7f9ce7ca34f0}, {SIG_DFL, [CHLD], SA_RESTORER|SA_RESTART, 0x7f9ce7ca34f0}, ) = 0
getpeername(31, {sa_family=AF_INET, sin_port=htons(51697), sin_addr=inet_addr("192.168.113.4")}, [16]) = 0
getsockname(31, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.100.190")}, [16]) = 0
open("/etc/hosts.allow", O_RDONLY)      = 33
fstat(33, {st_mode=S_IFREG|0644, st_size=580, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9ce9839000
read(33, "# /etc/hosts.allow: list of host"..., 4096) = 580
read(33, "", 4096)                      = 0
close(33)                               = 0
munmap(0x7f9ce9839000, 4096)            = 0
open("/etc/hosts.deny", O_RDONLY)       = 33
fstat(33, {st_mode=S_IFREG|0644, st_size=880, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9ce9839000
read(33, "# /etc/hosts.deny: list of hosts"..., 4096) = 880
read(33, "", 4096)                      = 0
close(33)                               = 0
munmap(0x7f9ce9839000, 4096)            = 0
getsockname(31, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.100.190")}, [16]) = 0
fcntl(31, F_SETFL, O_RDONLY)            = 0
fcntl(31, F_GETFL)                      = 0x2 (flags O_RDWR)
setsockopt(31, SOL_SOCKET, SO_RCVTIMEO, "\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
setsockopt(31, SOL_SOCKET, SO_SNDTIMEO, "<\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
fcntl(31, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
setsockopt(31, SOL_IP, IP_TOS, [8], 4)  = 0
setsockopt(31, SOL_TCP, TCP_NODELAY, [1], 4) = 0
futex(0x7f9cea5c9564, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f9cea5c9560, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f9cea5c6fe0, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=10, events=POLLIN}, {fd=12, events=POLLIN}], 2, -1^C <unfinished ...>
Process 23512 detached

This output repeats itself until the answer gets send. It could take up to 15 Minutes until the request gets served. In the local LAN its a matter of Milliseconds.

Why is this and how can I debug this further?

[Edit] tcpdump shows a ton of this:

14:49:44.103107 IP cassandra-test.mysql > 192.168.X.6.64626: Flags [S.], seq 1434117703, ack 1793610733, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
14:49:44.135187 IP 192.168.X.6.64625 > cassandra-test.mysql: Flags [P.], seq 106:145, ack 182, win 4345, length 39
14:49:44.135293 IP cassandra-test.mysql > 192.168.X.6.64625: Flags [P.], seq 182:293, ack 145, win 115, length 111
14:49:44.167025 IP 192.168.X.6.64624 > cassandra-test.mysql: Flags [.], ack 444, win 4280, length 0
14:49:44.168933 IP 192.168.X.6.64626 > cassandra-test.mysql: Flags [.], ack 1, win 4390, length 0
14:49:44.169088 IP cassandra-test.mysql > 192.168.X.6.64626: Flags [P.], seq 1:89, ack 1, win 115, length 88
14:49:44.169672 IP 192.168.X.6.64625 > cassandra-test.mysql: Flags [P.], seq 145:171, ack 293, win 4317, length 26
14:49:44.169726 IP cassandra-test.mysql > 192.168.X.6.64625: Flags [P.], seq 293:419, ack 171, win 115, length 126
14:49:44.275111 IP 192.168.X.6.64626 > cassandra-test.mysql: Flags [P.], seq 1:74, ack 89, win 4368, length 73
14:49:44.275131 IP cassandra-test.mysql > 192.168.X.6.64626: Flags [.], ack 74, win 115, length 0
14:49:44.275149 IP 192.168.X.6.64625 > cassandra-test.mysql: Flags [P.], seq 171:180, ack 419, win 4286, length 9
14:49:44.275189 IP cassandra-test.mysql > 192.168.X.6.64626: Flags [P.], seq 89:100, ack 74, win 115, length 11
14:49:44.275264 IP 192.168.X.6.64625 > cassandra-test.mysql: Flags [P.], seq 180:185, ack 419, win 4286, length 5
14:49:44.275281 IP cassandra-test.mysql > 192.168.X.6.64625: Flags [.], ack 185, win 115, length 0
14:49:44.275295 IP cassandra-test.mysql > 192.168.X.6.64625: Flags [F.], seq 419, ack 185, win 115, length 0
14:49:44.275650 IP 192.168.X.6.64625 > cassandra-test.mysql: Flags [F.], seq 185, ack 419, win 4286, length 0
14:49:44.275660 IP cassandra-test.mysql > 192.168.X.6.64625: Flags [.], ack 186, win 115, length 0
14:49:44.275910 IP 192.168.X.6.64627 > cassandra-test.mysql: Flags [S], seq 2336421549, win 8192, options [mss 1351,nop,wscale 2,nop,nop,sackOK], length 0
14:49:44.275921 IP cassandra-test.mysql > 192.168.X.6.64627: Flags [S.], seq 3289359778, ack 2336421550, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
mveroone
  • 474
  • 7
  • 22
Subito
  • 388
  • 1
  • 3
  • 11
  • Is there systemc alls that are taking a long times ? (lines in strace that stay as last line for a few seconds or more) or is this trace repeating fast in a loop ? – mveroone Oct 24 '13 at 12:14
  • repeating fast in a loop. Seems to hang on for a few milliseconds every now and than but its impossible to figure out when. – Subito Oct 24 '13 at 12:30
  • updated question with tcpdump output. `strace -f -p` doesn't show anything more than `strace -p` – Subito Oct 24 '13 at 12:54
  • Hum... since it's doing network communication, maybe a `tcpdump` of those conversations would help (filter on hosts and ports) – mveroone Oct 24 '13 at 12:54
  • What kind of latency/packet loss are you seeing between the two hosts that are on different networks? What kind of networking hardware translates between the two networks? – devicenull Oct 24 '13 at 13:07
  • @devicenull: The dump is not that long, but I don't see signs of packet loss. We would see packets not being acknowledged, wouldn't we ? – mveroone Oct 24 '13 at 13:09
  • Yea, but we don't know if anything got removed from the capture – devicenull Oct 24 '13 at 13:16
  • Then, what the OP can do is to run tcpdump on both ends of the communications at the same time. – mveroone Oct 24 '13 at 13:29
  • I monitored one whole session from both ends and it does not show signs of packet loss. The tcpdump is quite long though so I don't want to sanitize and post it here. Is that neccessary? I did not remove anything from the capture. I did it with `tcpdump host 192.168.X.6 and port not 22` and removed the login requests. – Subito Oct 24 '13 at 14:04
  • Latency ~34ms, just FYI. – Subito Oct 24 '13 at 14:13
  • I think the right tcpdump filter is `not port 22` but that would be even better to do `tcpdump -n host 192.168.X.6 and port 3306` (why the X ?) – mveroone Oct 25 '13 at 07:45
  • sry, was a typo. I actually executes `not port 22`. I thought `port 3306` is not neccessary because there is nearly no traffic other than ssh and mysql ;). The X just because... Obfuscating? Jeah, I guess its not neccessary. – Subito Oct 25 '13 at 12:06
  • Can you transfer large files across this network without delay or issue? Where you mention that it reacts based on the result size, I'm curious if you are hitting some sort of MTU limit? Also, as devicenull asked, what hardware is translating the networks? Are they run over the internet in a VPN? – Regan Oct 25 '13 at 23:07
  • Also, is your query a single query asking for 100 rows, or is it several queries? MySQL does not perform well over high-latency links. If your application is read-only maybe you can run a slave mysql server at the other location which will be much less latency. On your mysql server run `mysqladmin status` before and then after you run your application, how many questions were asked by the client? If it's in the thousands it may simply be your latency to blame. – Regan Oct 25 '13 at 23:20
  • @Regan I can transfer large files without problems. Connection is via a VPN (Latency ~34ms) Device is a Watchguard (dont know which model - a big Rack-version) and the appropriate Client on my Side. I dont think the network is the problem, because filetransfers, Remote-Sessions, SSH-Session etc all work without problems. Its just a single Query asking for n Rows (no LIMIT, just a SELECT). Typical Resultsize is around 100. I just run one SELECT at a time, so just one Question to answer. – Subito Oct 27 '13 at 14:23
  • Are you blocking ICMP packets in a firewall between the external client and the server? If so, check that you're allowing MTU negotiation packets through (type 3, code 4 IIRC). – Jenny D Oct 30 '13 at 10:06
  • @JennyD the first Router I was using was in fact blocking ICMP. But I also tried another router which allows all sorts of ICMP traffic - with no difference. – Subito Oct 30 '13 at 13:59
  • That this is going through a VPN and from the tcpdump above, this is very, very likely an MTU-related problem. They are notoriously difficult to diagnose. Force MTU to be identical on client, server, so on, and make sure the max MTU on all switches involved is /higher/ than the the clients (for instance, it's pretty common for max MTU on a switch to be 1528 when the clients are all at 1500). Alternatively you might try forcing the MTU on your VPN down, significantly, try 1400, then 1380, even 1300. – Nex7 Oct 31 '13 at 07:12
  • @Nex7 That's pretty much what I was getting at – Jenny D Oct 31 '13 at 08:18
  • In the VPN there is already an enforced MTU of 1500. I tried to lower it on the client which runs the app and got down to 1300 without a change regarding MySQL behaviour. I thought about MTU, too but if I connect with mysql-client on the same computer, it can run queries just fine. I'm not quite sure what this means, though. – Subito Oct 31 '13 at 14:37
  • (I checked the MTU on all devices which handle the traffic - namely one PC, 2 routers, 3 switches. The Switches have MTU's around 1600, the routers and this PC have exactly 1500) – Subito Oct 31 '13 at 14:45

2 Answers2

1

On tcpdump picture one can see connection pooll and/or frequent restarting connections. Check if maximal connection limit is not reached.

Veniamin
  • 863
  • 6
  • 11
  • Its not reached. If I execute the same app on a host in the same LAN, it gets served immediatly. – Subito Oct 25 '13 at 12:06
  • What application do you use? Does it use connection pool? – Veniamin Oct 26 '13 at 08:53
  • ahh like if casandra is sharded, it may not support cross network nodes, due to broadcast traffic? – nandoP Oct 26 '13 at 23:24
  • my application is one I'm currently writing. Does not use connection pool but only has 1-3 Connections open at a time. – Subito Oct 27 '13 at 14:24
  • @Subito 1. Try to connect with standard mysql client. 2. How do your mysql access rules look like? – Veniamin Oct 28 '13 at 09:50
  • strange. The standard client does not seem to have these problems... I can work on the tables like a charm. – Subito Oct 28 '13 at 12:25
  • well, and how this output looks like: == select User,Host,max_questions, max_updates, max_connections, max_user_connections from mysql.user; == ? – Veniamin Oct 30 '13 at 15:34
0

Subito,

Have you tried to add skip-name-resolve in your /etc/my.cnf file?

[mysqld]
skip-name-resolve
Douglas P.
  • 71
  • 2