0

I am trying to run a spark job to process some Json data using Spark SQL. When i submit the job, I see the following error in the logs,

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f29b96d5000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid5716.log

I am using the following code in the application,

val url = "foo://fooLink"
val rawData = sqlContext.read.option("multiline", true).json(url)
val pwp = new PrintWriter(new File("/tmp/file"))
rawData.collect.foreach(pwp.println)
pwp.close()

Command used to submit the job:

spark-submit --spark-conf spark.driver.userClassPathFirst=true --region us-east-1 --classname someClass somePackage-1.0-super.jar

It works for lesser data. But for some reason, the job does not create the "/tmp/file" in the cluster and throws the above error in the driver logs. Is there a way I can work around this? Any ideas would be greatly appreciated. Thanks :)

geetika reddy
  • 65
  • 4
  • 8
  • [@geetikareddy](https://stackoverflow.com/questions/48216600/spark-jvm-insufficient-memory-error-while-using-spark-sql/48216659#comment83416672_48216659) Heap size (`Xmx`) can be set using `spark.executor.memory` and `spark.driver.memory` for executors and driver respectively. `spark.executor.extraJavaOptions` and `spark.driver.extraJavaOptions` can used to set other JVM options. But to be honestmultiline + collect doesn't sound like a good fit for Spark. Especially when example code doesn't do anything, that couldn't be done (a guess) with standard utilities for a given file system. – zero323 Jan 11 '18 at 23:39
  • Thanks @user6910411 for the explanation :) "multiline + collect" worked for lesser data but sounds like it is not a good fit for larger set of data. I will try to change the logic and try it out. – geetika reddy Jan 11 '18 at 23:52
  • If you cannot do any better, you can start with replacing `rawData.collect` with `rawData.toLocalIterator` - might or might not help, and it has its own caveats (search for these before you commit to using it), but worth trying. – zero323 Jan 11 '18 at 23:54

1 Answers1

0

You will have to tweak some VM flags : XX:MaxDirectMemorySize and Xmx

Edit your spark-defaults.conf and modifiy the spark.executor.extraJavaOptions option to set the flags.

BluEOS
  • 576
  • 6
  • 13
  • 1
    Maximum heap size should never be set directly in Apache Spark. Please read [configuration docs](https://spark.apache.org/docs/latest/configuration.html) and adjust your answer accordingly. – zero323 Jan 11 '18 at 22:55
  • Before downvoting, read the doc : https://spark.apache.org/docs/latest/configuration.html#Dynamically-Loading-Spark-Properties . By setting XX:MaxDirectMemorySize and Xmx the VM will be able to allocate the needed memory. – BluEOS Jan 11 '18 at 23:02
  • The problem is not what you are trying to do, but how. Suggesting that user should Xmx is just not helpful. I am happy to retract the vote once it is fixed, because otherwise, it sounds like a valid advice, although it would make more sense, if you suggest which component is failing (with collect and multiline JSON it can be both driver and executor). – zero323 Jan 11 '18 at 23:11
  • Hi guys, could you please elaborate on tweaking these flags as I am pretty new to this and would really appreciate your help! Thanks. – geetika reddy Jan 11 '18 at 23:19
  • 1
    It really depends on your workload and of course on the total memory of your server. For a server with 32GB, you can start with 20GB of max memory (-Xmx20G) and set 8GB of direct memory size, the remaining 10GB will be used by the OS (for file caching, buffer,...). – BluEOS Jan 12 '18 at 12:24
  • 2
    @BluEOS I read the link you posted, above. The docs there actually warn against setting `Xmx` options in the `extraJavaOptions`, which, unless I've misunderstood something, works against the argument you're attempting to make... – josiah Aug 06 '18 at 20:50