H2O in spark cluster mode giving different predictions from spark local mode. H2O in spark local is giving better than spark cluster why it is happening ,can you help me? Tell me whether it's H2O behaviour.
Two Data set are being used. One for training the model and another for scoring.
trainingData.csv : 1.8MB (number of rows are 2211),
testingData.csv : 1.8MB (number of rows are 2211),
Driver Memory : 1G,
Executors Memory: 1G,
Number Of Executors : 1
The following command is being used over cluster :=>
nohup /usr/hdp/current/spark2-client/bin/spark-submit --class com.inn.sparkrunner.h2o.GradientBoostingAlgorithm --master yarn --driver-memory 1G --executor-memory 1G --num-executors 1 --deploy-mode cluster spark-runner-1.0.jar > tool.log &
1)Main Method
public static void main(String args[]) {
SparkSession sparkSession = getSparkSession();
H2OContext h2oContext = getH2oContext(sparkSession);
UnseenDataTestDRF(sparkSession, h2oContext);
}
2)h2o context is being created.
private static H2OContext getH2oContext(SparkSession sparkSession) {
H2OConf h2oConf = new H2OConf(sparkSession.sparkContext()).setInternalClusterMode();
H2OContext orCreate = H2OContext.getOrCreate(sparkSession.sparkContext(), h2oConf);
return orCreate;
}
3)spark session is being created.
public static SparkSession getSparkSession() {
SparkSession spark = SparkSession.builder().appName("Java Spark SQL basic example").master("yarn")
.getOrCreate();
return spark;
}
4)Setting GBM parameters.
private static GBMParameters getGBMParam(H2OFrame asH2OFrame) {
GBMParameters gbmParam = new GBMParameters();
gbmParam._response_column = "high";
gbmParam._train = asH2OFrame._key;
gbmParam._ntrees = 10;
gbmParam._seed = 1;
return gbmParam;
}