I am currently in the midst of doing distributed load testing on Amazon's EC2 services and have diligently followed all documentation/forum/support on how to get things to work, but unfortunately find myself stuck at this point. No one in any of the relevant IRC's has been able to answer this either...
Here is what I am seeing:
I am at a point where I can get Tsung to work perfectly well if I run it simply on the controller itself, but with options:
FROM tsung.xml
<client host="tester0" weight="8" maxusers="10000" cpu="4"/>
Also - it works with higher/lower CPU values.
I can also very easily get it to work locally using:
use_controller_vm="true"
but this is of no use to me now, as I cannot get the throughput that I'd like.
In order to get things working, I have ssh keys installed. I have opened all ports on these servers [ 0 - 65535 ] and have the exact same versions of Tsung, Erlang and, well, everything on the server is the same actually (they are images of each other).
Tsung version 1.4.2 Erlang R15B01 Ubuntu 12.04LTS Same EC2 security group (all ports open - both TCP & UPD and NO iptables or SELinux) Same EC2 availability zone
When I do start tsung, I get it to work when only send as above to tester0 and ts_config_server starts newbeam with:
ts_config_server:(6:<0.84.0>) starting newbeam on host tester0 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
However, whenever I try to run this with any remote server, the entire test fails and I get zero users:
<client host="tester1" weight="8" maxusers="10000" cpu="1"/>
ts_config_server:(6:<0.84.0>) starting newbeam on host tester1 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451924"
However, when I try to run it with TWO clients (i.e, as below):
<client host="tester0" weight="8" maxusers="10000" cpu="1"/>
<client host="tester1" weight="8" maxusers="10000" cpu="1"/>
I once again get zero users to start hitting my web servers. I'm not sure why and this is not intuitive to me at all.
ts_config_server:(6:<0.84.0>) starting newbeam on host tester1 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
ts_config_server:(6:<0.85.0>) starting newbeam on host tester0 with Args " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751"
One thing that I do notice is that, of all the args passed to slave:start, only ONE does not exist, and that is the one following the -boot directive:
/usr/lib/erlang//lib/tsung-1.4.2/priv/tsung
Rather, in that directory, I have only the following files:
:~$ ls /usr/lib/erlang//lib/tsung-1.4.2/priv
tsung.boot tsung_controller.rel tsung_recorder.boot tsung_recorder.script
tsung_controller.boot tsung_controller.script tsung_recorder_load.boot tsung.rel
tsung_controller_load.boot tsung_load.boot tsung_recorder_load.script tsung.script
tsung_controller_load.script tsung_load.script tsung_recorder.rel
The last thing I've actually tried is to log what is happening with my ssh session when I try to slave:start, but I get no results. I did this by running:
erl -rsh ssh -sname tsung -r ssh_log_me
Where ssh_log_me is:
#!/bin/sh
echo "$0" "$@" > /tmp/my-ssh.log
ssh -v "$@" 2>&1 | tee -a /tmp/my-ssh.log
But I get no output when I run:
(tsung@tester0)1> slave:start_link(tester1, tsung, " -rsh ssh -detached -setcookie tsung -smp disable +A 16 +P 250000 -kernel inet_dist_listen_min 64000 -kernel inet_dist_listen_max 65500 -boot /usr/lib/erlang//lib/tsung-1.4.2/priv/tsung -boot_var TSUNGPATH /usr/lib/erlang/ -pa /usr/lib/erlang//lib/tsung-1.4.2/ebin -pa /usr/lib/erlang//lib/tsung_controller-1.4.2/ebin +K true -tsung debug_level 7 -tsung log_file ts_encoded_47home_47ubuntu_47_46tsung_47log_4720120719_451751").
{error,timeout}
I have looked through erlang's -boot directive and through the actual erlang code (for ts_config_server) but I am a little lost at this point and might just be missing one last piece of information.
I ask that you please take a look at my xml file here: http://pastebin.com/2MEbL6gd