0

I have a container running hadoop. I have another docker file which contains Map-Reduce job commands like creating input directory, processing a default example, displaying output. Base image for the second file is hadoop_image created from first docker file.

EDIT

Dockerfile - for hadoop

 #base image is ubuntu:precise
 #cdh installation
 #hadoop-0.20-conf-pseudo installation
 #CMD to start-all.sh

start-all.sh

 #start all the services under /etc/init.d/hadoop-*

hadoop base image created from this.

Dockerfile2

 #base image is hadoop
 #flume-ng and flume-ng agent installation
 #conf change
 #flume-start.sh

flume-start.sh

#start flume services

I am running both containers separately. It works fine. But if i run

docker run -it flume_service

it starts flume and show me a bash prompt [/bin/bash is the last line of flume-start.sh]. The i execute

hadoop fs -ls /

in the second running container, i am getting the following error

ls: Call From 514fa776649a/172.17.5.188 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

I understand i am getting this error because hadoop services are not started yet. But my doubt is my first container is running. I am using this as base image for second container. Then why am i getting this error? Do i need to change anything in hdfs-site.xml file on flume contianer?

Pseudo-Distributed mode installation.

Any suggestions?

Or Do i need to expose any ports and like so? If so, please provide me an example

EDIT 2

  iptables -t nat -L -n

I see

  sudo iptables -t nat -L -n
  Chain PREROUTING (policy ACCEPT)
  target     prot opt source               destination
  DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           ADDRTYPE match dst-

  Chain POSTROUTING (policy ACCEPT)
  target     prot opt source               destination
  MASQUERADE  tcp  --  192.168.122.0/24    !192.168.122.0/24    masq ports: 1024-6
  MASQUERADE  udp  --  192.168.122.0/24    !192.168.122.0/24    masq ports: 1024-6
  MASQUERADE  all  --  192.168.122.0/24    !192.168.122.0/24
  MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0

  Chain OUTPUT (policy ACCEPT)
  target     prot opt source               destination
  DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8         ADDRTYPE match dst-

 Chain DOCKER (2 references)
 target     prot opt source               destination

It is in docker@domian. Not inside a container.

EDIT See last comment under surazj' answer

Gibbs
  • 21,904
  • 13
  • 74
  • 138

2 Answers2

0

Have you tried linking the container?

For example, your container named hadoop is running in psedo dist mode. You want to bring up another container that contains flume. You could link the container like

 docker run -it --link hadoop:hadoop  --name flume ubuntu:14.04 bash

when you get inside the flume container - type env command to see ip and port exposed by hadoop container.

From the flume container you should be able to do something like. (ports on hadoop container should be exposed)

$ hadoop fs -ls hdfs://<hadoop containers IP>:8020/

The error you are getting might be related to some hadoop services not running on flume. do jps to check services running. But I think if you have hadoop classpath setup correctly on flume container, then you can run the above hdfs command (-ls hdfs://:8020/) without starting anything. But if you want

hadoop fs -ls /

to work on flume container, then you need to start hadoop services on flume container also.

On your core-site.xml add dfs.namenode.rpc-address like this so namenode listens to connection from all ip

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address</name>
    <value>0.0.0.0:8020</value>
  </property>

Make sure to restart the namenode and datanode

sudo /etc/init.d/hadoop-hdfs-namenode restart && sudo /etc/init.d/hadoop-hdfs-datanode restart

Then you should be able to do this from your hadoop container without connection error, eg

hadoop fs -ls hdfs://localhost:8020/
hadoop fs -ls hdfs://172.17.0.11:8020/

On the linked container. Type env to see exposed ports by your hadoop container

env

You should see something like HADOOP_PORT_8020_TCP=tcp://172.17.0.11:8020

Then you can verify the connection from your linked container.

telnet 172.17.0.11 8020

surajz
  • 3,471
  • 3
  • 32
  • 38
  • It is fine with this map-reduce example. I am instaling cdh4.6.0. It is running in one container. Now i have flume installation instruction on another docker file. Flume requires hadoop? But it is running as a seperate container. If flume commands are part of same hadoop_first_docker_file, it ll work fine. But if it is a seperate file, it requires commands getting executed on running container. How to execute/specify second_docker_file commands on hadoop_first_docker_file container? Is it clear now? Thanks. – Gibbs Jan 27 '15 at 19:46
  • If your main concern is to just share the script from the container 1, then you can attach a volume to both container and move the script to the volume. so both container has access to the volume (i.e. -v flag). On your installing, you are trying to run flume in standalone mode in second container, I don't see the issue. Are you trying to write to hdfs from flume? – surajz Jan 27 '15 at 20:01
  • Yes suraj. I am facing issues while trying to access hdfs from second dockerfile. Because services are running on the container started from first docker file – Gibbs Jan 28 '15 at 04:28
  • I want to achieve "Sample flume program running on Docker2 flume container with sink as HDFS from Docker1 hadoop container". – Gibbs Jan 28 '15 at 06:11
  • Thanks Suraj. I don't want to start hadoop services in flume container. Without starting them, i want to do. I have tried **env** but no ports shown up and iptables are also not working. and i tried *hadoop fs -ls hdfs://8020/ hadoop fs -ls hdfs://:8020/*. first says socketexception and later is invalid argument. Any suggestions? – Gibbs Jan 29 '15 at 05:01
  • Hi surajz, I re-tried everything. It almost fine. I can see all the env variable formed automatically by docker while linking. But when i try `hadoop fs -ls hdfs://172.17.0.179:8020/` I see connection refused error at port 8020. but i can able to ping the hadoop host. my exposed ports are 50070,80,22,500105 and two more. But not 8020. I tried with 80 instead 8020. Getting the same error. Any idea? – Gibbs Feb 04 '15 at 05:46
  • Hi Gops, In case you have not figured it out. You have 2 options. Use a script to change the fs.defaultFS to point to your ip (172.17.0.179) using some tool like set. Or open up the namenode.rpc-address so namenode will access connection. see the example of second option above. – surajz Feb 11 '15 at 04:58
0

I think I met the same problem yet. I either can't start hadoop namenode and datanode by hadoop command "start-all.sh" in docker1.

That is because it launch namenode and datanode through "hadoop-daemons.sh" but it failed. The real problem is "ssh" is not work in docker.

So, you can do either

  • (solution 1) :
    Replace all terms "daemons.sh" to "daemon.sh" in start-dfs.sh, than run start-dfs.sh

  • (solution 2) : do

    $HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode $HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode

You can see datanode and namenode are working fine by command "jps"

Regards.

waue0920
  • 101