Spark Job Processing Time increases to 4s without explanation

Question

We are running a 1 namenode and 3 datanode cluster on top of Azure. On top of this I am running my spark job on Yarn-Cluster mode.

Also, We are using HDP 2.5 which have spark 1.6.2 integrated into its setup. Now I have this very weird issue where the Processing time of my job suddenly increases to 4s.

This has happened quite some times but does not follow a pattern, sometimes the 4s waiting time is from the start of the job or may be at the middle of the job as shown below.

One thing to notice is that I have no events coming in which is processed so technically the processing time should stay almost the same. Also, my spark streaming job has a batch duration of 1s so it can't be that.

I dont have any error in the logs or anywhere and I am being lost to process this issue.

Minor details about the job:

I am reading messages over kafka topic and then storing them within Hbase tables using Phoenix JDBC Connector.

EDIT: More Information

In the InsertTransactionsPerRDDPartitions, I am performing connection open and write operation to HBase using Phoenix JDBC connectivity.

updatedEventLinks.foreachRDD(rdd -> {
  if(!rdd.isEmpty()) {
  rdd.foreachPartition(new InsertTransactionsPerRDDPartitions(this.prop));
  rdd.foreachPartition(new DoSomethingElse(this.kafkaPublishingProps, this.prop));
 }
});

do you have an `id` or something to uniquely identify these events? If so, I would start with logging each step and how much it took to try and narrow the problem down for starters. — Eugene, Apr 19 '17 at 14:05
0 event doesn't mean the processing time should be short. E.g., you can open a connection to Hbase, write nothing, and close the connection. It may take several seconds. — zsxwing, Apr 20 '17 at 00:39
@Eugene I have unique uuid's but as you can see no events are being processed at that time. — Biplob Biswas, Apr 20 '17 at 08:05
@zsxwing I have updated my question with the functions where I am opening HBase connection. The thing is I am checking for empty rdd so it basically shouldn't go into the dessignated rdd.foreachpartition functions. And even if that was the case I still don't understand the sudden jump from 27 ms to 4s. — Biplob Biswas, Apr 20 '17 at 08:07

Spark Job Processing Time increases to 4s without explanation

0 Answers0