1

i am facing problem while configuring MPJ Express in cluster mode.

i am following the guide given at http://mpj-express.org/docs/guides/linuxguide.pdf

the environment variables are successfully Initialized as

1) Set MPJ_HOME and PATH variables

a. export MPJ_HOME=/path/to/mpj/

b. export PATH=$PATH:$MPJ_HOME/bin

2) the machines are also added successfully with a machines file as

mpjboot machines

it shows message starting mpjd...

3) Next step is to test the HelloWolrd program by

Compile: javac -cp .:$MPJ_HOME/lib/mpj.jar HelloWorld.java
10. Execute: mpjrun.sh -np 2 –dev niodev HelloWorld

when i do that i get this error

runtime.MPJRuntimeException: Cannot connect to the daemon at machine and port <10000>

The Platform i am using is

  • Sun Ultra 25 workstations with Solaris 10 OS
  • i have made a Fast Ethernet cluster of 2 machines
  • ssh is enabled with root user on each machine
  • network is established well

any help or solution is appreciated..

thank you

M Imtiaz
  • 97
  • 1
  • 2
  • 6

2 Answers2

1

Examine the cluster nodes environment variables.

Try adding vars to .bashrc in each cluster node (ssh to it from main node):

echo 'export MPJ_HOME=/home/<user>/path/to/mpj' >> ~/.bashrc
echo 'export PATH=$PATH:$MPJ_HOME/bin' >> ~/.bashrc

Alternately, turn on logging /conf/wrapper.conf, run and report your findings.

JoshDM
  • 4,939
  • 7
  • 43
  • 72
1

This is the first time i give an answer in stackOverflow. I set up MPJExpress-v0_42 in my centos6.3 with jdk1.6.0_32, and finally everything seems OK.

Question 1: can not connect to daemon at machine and port

This may happens that the daemon is not running. You can give a try as follows:

  1. you can use sudo netstat -anp |grep port to check whether the port is listening and also get the processId.
  2. In my machine, after I started the daemon using mpjdaemon -boot localhost, I used the sudo netstat -apn |grep 4000 command to check the port and got the followings.

    tcp        0      0 :::40000                    :::*                        LISTEN      8766/java           
    tcp        0      0 :::40001                    :::*                        LISTEN      8766/java
    

    the port is configured in $MPJ_HOME/conf/wrap.conf and my configuration is as follows:

    #port number for the daemon.
    wrapper.app.parameter.2=40001
    #Socket Server Port Number.
    wrapper.app.parameter.3=40000
    
  3. I also use the jps -m command to get the java daemon process, the result is as follows:

    8766 MPJDaemon 40001
    30850 Jps -m
    

    Here 8766 is the processId as netstat shows, and MPJDaemon is the daemon process, 40001 is the listening port.

In your question, you cannot connect the daemon process. Do the followings:

  1. you should check the port. If you cannot find the port using netstat, it generally means the MPJDaemon is not running.
  2. To be sure, you can also using jps to check the process. Maybe process is running but the port is another one
  3. Do not use the Cluster commands like mpjboot machines or mpjrun.sh -np 2 -dev niodev HelloWorld. You can use mpjdaemon -boot localhost to test current machine, if it is configured OK and you can set the machines file with one item of localhost and run Cluster commands.

Other questions that I met.

Question 2: compile failed using ant

At first I use MPJ-v0_44.zip and jdk1.6.0, but it cannot compile and give an error that ProcessBuilder has no method of "interNIO"(something like that). I analyse the source code and guess may be it is because JDK version is a litter lower. Given the fact that upgrading JDK is a litter complicated, I find another MPJ version of mpj-v0_42.zip and it compiled OK.

Question 3: mpjdaemon -boot localhost with no error message, but MPJDeamon is not running.

After I configure the environment variables like MPJ_HOME from .bash_profile to .bashrc, the question is resolved. I don't know why.

Question 4: command jps -m says "MPJDeamon is running" , but mpjdaemon -status localhost says "mpjdeamon is not running".

I use command (ssh localhost nohup 'jps -m') and it says jps command not found, but i use jps the command works fine. I guess maybe it is the same question as above. The jps's path is not configured in path environment variable of .bashrc. So after adding one line of PATH=/jpspath:$PATH in .bashrc, everything works fine.

peng li
  • 49
  • 6