0

I own a dedicated server at nuxit, which "guarantees" a 20mb connection. I run a game server there which i have programmed myself in java. The game does not consume a lot of memory/cpu/bandwith (simple games like chess), but i need to maintain tcp sockets opened during the user's login and logout.

I already pay more than 100$/month for this dedicated server, and the users experience 5 to 60 seconds lag, once every 10-60 mn. There are between 10 and 40 users at the same time on the server, at the moment.

I really have no experience in this, and in network management, nor dedicated server rental. I only know how to program. I wonder if this is normal, due to routing issues, or congestioning on my host's routers ? Or something else ?

Is it even theoretically possible to have more than 30 seconds lag on an already opened TCP socket ? I have searched the web for weeks, and never found anything on this topic.

Which tools could I use, and how, to check where the problems come from, and get a result with certitude ? I have never done such a thing so the simpler the better.

Thanks for your help.

Joel
  • 195
  • 2
  • 10
  • Has a local at your office stress test shown that you really know how too program soemthing that keeps 10-40 connections open? I suspect a programmign error a lot more than anything else. – TomTom Oct 18 '11 at 15:00
  • I can't be sure of anything, but I have detailed debuging logs on the server which prove as a fact that sending data can take up to 30 seconds, when executing outputstream.write(). And I was never able to reproduce the problem locally on my development machine. – Joel Oct 18 '11 at 15:08
  • Installation issue? – TomTom Oct 18 '11 at 15:16
  • Do you mean it's impossible to have a 30 seconds lag on a tcp socket ? Even from someone with a bad DSL or 3G provider ? This is what i'm also trying to find out... Thanks. – Joel Oct 18 '11 at 15:20
  • 1
    Exaqctly. It is not possible unless your 30 connections all basically transfer mmultiple times the line bandiwdth. – TomTom Oct 18 '11 at 19:45
  • so i guess the bandwidth advertised is not the one i get. i have to find a better host. – Joel Oct 18 '11 at 21:32
  • 1
    you can verify your bandwidth from the server by executing in the command line (linux or similar) `wget --output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip` it's not 100% accurate though, as it it limited by speedtest server, but it should give you an idea. see [here](http://stackoverflow.com/questions/426272/how-to-test-internet-connection-speed-from-command-line) – Bruno Flávio Oct 20 '11 at 10:19

2 Answers2

1

If you ping is everything ok? Try doing 1000 pings and check if all 100% come back ok. I usually start debugging with mtr to see if some packets don't get trough. Maybe there is some packet loss and that's why there is lag as the server has to retransmit the lost tcp packets.

http://www.bitwizard.nl/mtr/

Jure1873
  • 3,702
  • 1
  • 22
  • 28
  • what's the command for doing 1000 pings please ? – Joel Oct 18 '11 at 14:32
  • 1
    if you're on linux or similar `ping -i 0.1 -c 1000 localhost` pings localhost 1000 times with a 0.1 seconds interval. (will take minimum 1000*0.1 seconds and you'll need to be root to set the interval lower than 0.2seconds); on windows use `ping -n 1000 localhost` (i don't know how to set the interval here). Obviously you won't be pinging localhost but instead your server ip/name. – Bruno Flávio Oct 18 '11 at 15:09
  • 999 packets out of 1000 arrived successfully and 1/1000 was lost. Is this a good result ? – Joel Oct 18 '11 at 17:04
  • yes there are packet lost. – Joel Oct 18 '11 at 21:31
  • 1
    TCP should handle the packet loss nicely by retransmitting the failed packets and reordering, so unless you've got a high rate of lost packets I don't think it's a problem. 999 is good: the first packet timed out due to routing delay. – Bruno Flávio Oct 19 '11 at 10:03
  • Thanks, I understand this, but what is happening is that when there is a packet lost, the tcp connection stalls for 10-60 seconds before resending the packets. And what is strange is that if i open another socket during this stall to the same server, even another port like HTTP-80, it un-freezes the first TCP connection, and resends the packets. I don't if something is wrong with the retry-delay or something, but i'm certain it's not coming from the application layer. – Joel Oct 20 '11 at 12:39
  • what if there is a broken router in between that is doing something it shouldnt? Does this happen always with the same clients? Maybe you could try lowering the mtu? – Jure1873 Oct 20 '11 at 21:34
1

I believe more information may be needed to solve this issue.

Are you sure it's a networking issue?

You could find this out with wireshark running in your pc to have a better analysis of what happens in terms of tcp packets when such problem occurs. (to limit the size of the logs you should limit the captured packets to the server in question).

There could be other processes running on the server at the same time, which slow down your application? Depending on your OS you could setup software to monitor variables such as the CPU/mem/IO.

What happens when your application socket closes due to a failure? you could be observing time-out issues, which are solved when your app creates a new socket?

Also, when there is the slowdown that you mention, are you able to establish ssh/ftp/http connections to the server? In other words, does this slowdown affect other things other than your game service?

Hope some of these ideas can prove helpful.

Bruno Flávio
  • 176
  • 3
  • 8
  • I have installed wireshark on my client pc and i will see what happens when the lag occurs, thanks. For the other informations: The cpu/memory are used 10% max on the server, the socket failures are handled in my program, and every thread is shut down, and when the lag occurs everything else works fine, including the other concurrent clients of my game. – Joel Oct 18 '11 at 14:37
  • there are packets lost, this is when the lag happens. this is what wireshark said. – Joel Oct 18 '11 at 21:31
  • thanks, now i must figure out if this comes from the server or my dsl connection. – Joel Oct 18 '11 at 21:42
  • @joel if this affects random clients, from different places, it shouldn't be your dsl connection. you you think you need it, you may also capture the packets on the server end, even if you have no gui. you can use tcpdump and record to a file all traffic on the port your application communicates. this file can then be loaded into wireshark, for analysis. see (here)[http://www.danielmiessler.com/study/tcpdump/]. I usually do a `tcpdump port 1001 -w outputfile.log` to check traffic input or output traffic on port 1001 and save into the file. – Bruno Flávio Oct 19 '11 at 09:51
  • @joel, also is the data that goes back and forth on the socket looking as you would expect at the moment of the slowdown? it's just so strange that the problem manifests itself to a single client. Could it happen that you have some problem in the software that is causing the thread(?) to halt? – Bruno Flávio Oct 19 '11 at 09:55
  • No I'm sure it's related to the system or network, because I have traced what was happening inside my program, and it blocks on a read or write on the socket's stream, as it should, but it takes too long. Now I have observed on WireShark (on the client side) that everytime there is a slowdown, there is a lost packet, that is resent later. But it doesn't affect the flow of the java program, neither on serverside, or client side. Now if the server is proven to lose packets (which I'm pretty sure of), what should I do ? – Joel Oct 19 '11 at 10:08
  • Also it doesn't happen so often, and it is not related to the number of clients on the server. Yesterday I had a 50 clients peak for 2 hours and zero lag. – Joel Oct 19 '11 at 10:13