3

Here's the error I get:

2015-12-11 04:01:47,306 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: anmol-vm1-new:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.0.1.193:57002 dst: /10.0.1.190:50010
org.apache.hadoop.net.ConnectTimeoutException: 65000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.0.1.192:50010]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:650)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Thread.java:745)

http://pastebin.com/bP6W7P7y DataNode log (contains errors)

I have attached two screenshots of the node I currently see plus the current thing I see from gridmix-generate.sh bring run now:

enter image description here enter image description here

http://pastebin.com/jd12yDEk gridmix-generate runtime log

in yarn-site.xml we have this: yarn.execution.optimistic-containers-policy only_conservative

and we have the same exact conf folder across all the VMs. We had dstat installed.

Any idea what could be wrong or what is missing as right now nodes are not down still but somewhere in the middle of execution the nodes get down and missing.

at the end here's our yarn application -status: http://pastebin.com/WiMa0yRf

Mona Jalal
  • 34,860
  • 64
  • 239
  • 408

0 Answers0