4

I am using spark streaming in my application. Data comes in the form of streaming files every 15 minute. I have allocated 10G of RAM to spark executors. With this setting my spark application is running fine. But by looking the spark UI, under Storage tab -> Size in Memory the usage of RAM keep on increasing over the time. enter image description here When I started streaming job, "Size in Memory" usage was in KB. Today it has been 2 weeks 2 days 22 hours since when I started the streaming job and usage has increased to 858.4 MB. Also I have noticed on more thing, under Streaming heading: enter image description here

When I started the job, Processing Time and Total Delay (from the image) was 5 second and which after 16 days, increased to 19-23 seconds while the streaming file size is almost same. Before increasing the executor memory to 10G, spark jobs keeps on failing almost every 5 days (with default executor memory which is 1GB). With increase of executor memory to 10G, it is running continuously from more than 16 days.

I am worried about the memory issues. If "Size in Memory" values keep on increasing like this, then sooner or later I will run out of RAM and spark job will get fail again with 10G of executor memory as well. What I can do to avoid this? Do I need to do some configuration?

Just to give the context of my spark application, I have enable following properties in spark context:

SparkConf sparkConf = new SparkConf().setMaster(sparkMaster).                               .set("spark.streaming.receiver.writeAheadLog.enable", "true")
        .set("spark.streaming.minRememberDuration", 1440);

And also, I have enable checkpointing like following:

sc.checkpoint(hadoop_directory)

I want to highlight one more thing is that I was having issue while enabling checkpointing. Regarding checkpointing issue, I have already posted a question on following link: Spark checkpoining error when joining static dataset with DStream

I was not able to set the checkpointing the way I wanted, so did it differently (highlighted above) and it is working fine now. I am not asking checkpointing question again, however I mentioned it so that it might help you to understand if current memory issue somehow related to previous one (checkpointing).

Environment detail: Spark 1.4.1 with two node cluster of centos 7 machines. Hadoop 2.7.1.

Community
  • 1
  • 1
Rajneesh Kumar
  • 167
  • 1
  • 13

1 Answers1

1

I am worried about the memory issues. If "Size in Memory" values keep on increasing like this, then sooner or later I will run out of RAM and spark job will get fail again with 10G of executor memory as well.

No, that's not how RAM works. Running out is perfectly normal, and when you run out, you take RAM that you are using for less important purposes and use it for more important purposes.

For example, if your system has free RAM, it can try to keep everything it wrote to disk in RAM. Who knows, somebody might try to read it from disk again and having it in RAM will save an I/O operation. Since free RAM is forever wasted (it's not like you can use 1GB less today to use 1GB more tomorrow, any RAM not used right now is potential to avoid I/O that's forever lost) you might as well use it for anything that might help. But that doesn't mean it can't evict those things from RAM when it needs RAM for some other purpose.

It is not at all unusual, on a typical system, for almost all of its RAM to be used and almost all of its RAM to also be available. This is typical behavior on most modern systems.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Thanks David for your quick reply. I am having a streaming job which keep on running for days, months or years. I am worried means if "Size in memory" (as you can see from the image) will keep on increasing and hence spark job can fail because of memory and I need to re-run it. It can happen again and again periodically if I will not do something preventive. As I mentioned that my **spark jobs keeps on failing almost every 5 days (with default executor memory which is 1GB)**. And also I have limit of RAM available in spark nodes. – Rajneesh Kumar Feb 18 '16 at 14:15
  • Not an answer you were looking for persay, but to help with this issue, try decreasing the dstream duration. Spark is optimized for smaller tasks. In addition, try increasing the number of parallelizations, by modifying the spark.default.parallelism. Keep increasing the number of partitions by 1.5x until its no longer faster to process the data. Also, cache the data you are joining. Reading it from the HDFS isn't going to be the fastest, and can add some overhead. – Joe Widen Feb 18 '16 at 14:48
  • @user3478678 Can you tell us more about exactly how they're failing, what errors/logging you are getting, and so on? Do you have some reason to think RAM usage is causing the failures? If so, what is it? – David Schwartz Feb 18 '16 at 18:07
  • Also, a general piece of advice to get better answers: If you have a problem, lead with the problem and describe that problem in as much detail as possible. Then start talking about thinks that you think might explain the problem and be sure to explain why you think those things might explain the problem. – David Schwartz Feb 18 '16 at 19:18
  • @David.. thanks for your advice and I will definitely follow that. Spark job was keep on failing earlier when executors were running with 1GB memory with error message as: **1.MapOutputTracker:75 Missing an output location for shuffle 1248** and **2. java.lang.OutOfMemoryError Java heap space**. This is memory issue for sure. Post increasing the executor memory to 10GB, spark job is running from last 17 day without failure. My concern is that if "Size in memory" (see from above image) will keep on increasing everyday then there are chances of failure even with 10GB executor memory very soon. – Rajneesh Kumar Feb 19 '16 at 06:38
  • @maasg... can you suggest something here? – Rajneesh Kumar Feb 22 '16 at 16:53
  • have you solve this problem I'm facing similar issue – Yumlembam Rahul Nov 07 '19 at 07:19