We are running a 1 namenode and 3 datanode cluster on top of Azure. On top of this I am running my spark job on Yarn-Cluster mode.
Also, We are using HDP 2.5 which have spark 1.6.2 integrated into its setup. Now I have this very weird issue where the Processing time of my job suddenly increases to 4s.
This has happened quite some times but does not follow a pattern, sometimes the 4s waiting time is from the start of the job or may be at the middle of the job as shown below.
One thing to notice is that I have no events coming in which is processed so technically the processing time should stay almost the same. Also, my spark streaming job has a batch duration of 1s so it can't be that.
I dont have any error in the logs or anywhere and I am being lost to process this issue.
Minor details about the job:
I am reading messages over kafka topic and then storing them within Hbase tables using Phoenix JDBC Connector.
EDIT: More Information
In the InsertTransactionsPerRDDPartitions, I am performing connection open and write operation to HBase using Phoenix JDBC connectivity.
updatedEventLinks.foreachRDD(rdd -> {
if(!rdd.isEmpty()) {
rdd.foreachPartition(new InsertTransactionsPerRDDPartitions(this.prop));
rdd.foreachPartition(new DoSomethingElse(this.kafkaPublishingProps, this.prop));
}
});