1

I have created an Amazon EC2 micro instance, running Amazon AMI. I logged into the server using ssh client. Once it logs in successfully, if I issue the "top" command, the top output never shows up and the command never returns. It constantly waits. I have to kill the ssh session and re-login. Of course none of the other things like java, tomcat etc., are working.

I rebooted the server, same problem persists. I finally changed the instance to "small" instance, even there I am facing the same problem.

During some times of the day, it seems to work fine without the above problems.

Anybody has any ideas on why this happens? Is this related to the CPU stealing or thrashing?

SOLVED: "To avoid potential problems with MTU settings and packet loss, also add a rule to allow "All ICMP". After you create the new rule, click Apply Rule Changes." Got the solution from this link- http://code.google.com/p/opendatakit/wiki/AggregateAWSInstall

2 Answers2

4

I suspect MTU problems whenever I get a suspicious network hang like this. Try cating a large text file (something over 4k), and see if that hangs the session, too. If it does, it's almost certain that you've got a small MTU somewhere along the path that's causing you issues (especially since it's time-of-day dependent; perhaps your traffic's taking a different route at different times of the day). Google around (or ask a new question) to work out how to fix MTU problems (I'm not going to spend long writing it all out here, on the off-chance I'm wrong).

womble
  • 96,255
  • 29
  • 175
  • 230
1

No, but you can easily get some debugging information about what is hanging the process.

Presumably you can login to another ssh session, (or if not make sure you have 2 session opened already)

So basically if I start a long running process like so

sleep 1000

I can find it from another terminal session like so;

 # ps -ef  | grep sleep | grep -v grep
 root     11768 11287  0 10:36 pts/19   00:00:00 sleep 1000

I can examine the syscall that is being executed by that process using the strace tool (from the strace package in yum/apt repo)

# strace -f -p 11768
Process 11768 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>^C <unfinished ...>
Process 11768 detached
Tom
  • 11,176
  • 5
  • 41
  • 63
  • I did "cat"ing a file on a ssh session. the "cat" did not return back with any output and got stuck. I logged in to the box again on a different ssh-terminal. When I issue "ps -ef | grep cat | grep -v grep" command, it returned nothing. This means the "cat" process actually terminated normally and hence not showing up in the process list. – Keshav Prasad May 19 '12 at 13:10
  • its looks like @womble didn't get all those points for not knowing about stuff ... ;-) – Tom May 19 '12 at 13:12
  • Yep, you are right! :-) – Keshav Prasad May 19 '12 at 13:18