for my master thesis in computer science I succeed in implementing 4-profiles calculus (https://arxiv.org/abs/1510.02215) using giraph-1.3.0-snapshot (compiled with -Phadoop_yarn profile) and hadoop-2.8.4.
I configured a cluster on amazon ec2 composed of 1 namenode and 5 datanodes using t2.2xlarge (32GB, 8CPU) instances and I obtained results described here with input graphs of small/medium dimensions.
If I try to give in input to my giraph program bigger input graphs (e.g. like http://snap.stanford.edu/data/web-NotreDame.html) in some cases I obtain many errors related to netty and the yarn application FAILS, in other cases the yarn application remains in a RUNNING UNDEFINED state (then I killed it instead of waiting the default timeout) without apparently no error. I also tried to use m5.4xlarge (64GB, 16CPU) but I obtained same problems. I reported log errors of first case here:
- log of errors by giraph worker on datanodes pasted here: https://pastebin.com/CGHUd0za (same errors in all datanodes)
- log of errors by giraph master pasted here: https://pastebin.com/JXYN6y4L
I'm quite sure that errors are not related to insufficient memory of EC2 instances because in the log I always saw messages like "(free/total/max) = 23038.28M / 27232.00M / 27232.00M". Please help me because my master thesis is blocked with this problem :-(
This is an example of command that I used to run giraph, could you please check if parameters that I used are correct? Any other tuning will be appreciated!
giraph 4Profiles-0.0.1-SNAPSHOT.jar it.uniroma1.di.fourprofiles.worker.superstep0.gas1.Worker_Superstep0_GAS1
-ca giraph.numComputeThreads=8 // Since t2.2xlarge has 8 CORES, is it correct to set these parameters to 8?
-ca giraph.numInputThreads=8
-ca giraph.numOutputThreads=8
-w 8 // I set 8 workers since:
// - 5 datanodes on EC2"
// - every datanode configured for max 2 containers in order to reduce messages between datanodes
// - 2 containers are reserved for application master and giraph master
// - (5 datanodes * 2 max containers) - 2 reserved = 8 workers
// Is it correct as reasoning?
-yh 15360 // I set 15360 since it corresponds to
// - yarn.scheduler.minimum-allocation-mb property in yarn-site.xml
// - mapreduce.map.memory.mb property in mapred-site.xml
// Is it correct as reasoning?
-ca giraph.pure.yarn.job=true
-mc it.uniroma1.di.fourprofiles.master.Master_FourProfiles
-ca io.edge.reverse.duplicator=true
-eif it.uniroma1.di.fourprofiles.io.format.IntEdgeData_TextEdgeInputFormat_ReverseEdgeDuplicator
-eip INPUT_GRAPHS/HU_edges.txt-processed
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op output
-ca giraph.SplitMasterWorker=true
-ca giraph.messageCombinerClass=it.uniroma1.di.fourprofiles.worker.msgcombiner.Worker_MsgCombiner
-ca giraph.master.observers=it.uniroma1.di.fourprofiles.master.observer.Observer_FourProfiles
-ca giraph.metrics.enable=true
-ca giraph.useInputSplitLocality=false
-ca giraph.useBigDataIOForMessages=true
-ca giraph.useMessageSizeEncoding=true
-ca giraph.oneToAllMsgSending=true
-ca giraph.isStaticGraph=true
Furthermore I tried to use following netty parameters but I didn't resolve the problems. Could you please help me if I miss some important parameter or maybe I used it in a wrong way? I tried to generalize the value passed to netty parameters with a trivial formula nettyFactor*defaultValue where nettyFactor can be 1, 2, 3, ... (passed by shell parameter)
-ca giraph.nettyAutoRead=true
-ca giraph.channelsPerServer=$((nettyFactor*1))
-ca giraph.nettyClientThreads=$((nettyFactor*4))
-ca giraph.nettyClientExecutionThreads=$((nettyFactor*8))
-ca giraph.nettyServerThreads=$((nettyFactor*16))
-ca giraph.nettyServerExecutionThreads=$((nettyFactor*8))
-ca giraph.clientSendBufferSize=$((nettyFactor*524288))
-ca giraph.clientReceiveBufferSize=$((nettyFactor*32768))
-ca giraph.serverSendBufferSize=$((nettyFactor*32768))
-ca giraph.serverReceiveBufferSize=$((nettyFactor*524288))
-ca giraph.vertexRequestSize=$((nettyFactor*524288))
-ca giraph.edgeRequestSize=$((nettyFactor*524288))
-ca giraph.msgRequestSize=$((nettyFactor*524288))
-ca giraph.nettyRequestEncoderBufferSize=$((nettyFactor*32768))
... I have other questions: 1) This is my hadoop configuration. Please check it but I'm quite sure that is correct. I have only a question about it: since giraph does not use "reduce", is it correct to assing 0 MB to mapreduce.reduce.memory.mb in mapred-site.xml?
2) In order to avoid ClassNotFoundException error I copied the jar of my giraph application and all giraph jars from $GIRAPH_HOME and $GIRAPH_HOME/lib to $HADOOP_HOME/share/hadoop/yarn/lib. Is there a better solution?
3) Last but not least: Here you can find the completed hadoop/yarn log of my giraph program with following graph http://snap.stanford.edu/data/web-NotreDame.html as input. In this case the yarn application reamins in RUNNING UNDEFINED state.