1

I am trying to converting Spark DataFrame to H2O DataFrame

For spark setup, I am using

 .setMaster("local[1]")
 .set("spark.driver.memory", "4g")
 .set("spark.executor.memory", "4g")

and I tried H2O 2.0.2 and H2O 1.6.4. I got both the same error at:

 val trainsetH2O: H2OFrame = trainsetH
 val testsetH2O: H2OFrame = testsetH

The error message is:

 ERROR Executor: Exception in task 49.0 in stage 3.0 (TID 62)
 java.lang.OutOfMemoryError: PermGen space
     at sun.misc.Unsafe.defineClass(Native Method)
     at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
     at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
     at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:396)
     at java.security.AccessController.doPrivileged(Native Method)
     at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395)
     at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113)
     at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331)
     at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376)
     at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72)
     at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:493)
     at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
     at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
     at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
     at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
     at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

where is wrong? The data in the trainset and testset are less than 10K, so it is actually pretty small.

lserlohn
  • 5,878
  • 10
  • 34
  • 52

1 Answers1

3

the problem is that you run out of PermGem memory which is not the same memory space as you usually configure for your driver and executors using

.set("spark.driver.memory", "4g") .set("spark.executor.memory", "4g")

This is part of JVM's memory which contains loaded classes. To increase it for both spark driver and executors, call spark-submit or spark-shell command with following arguments.

--conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m" --conf spark.executor.extraJavaOptions="-XX:MaxPermSize=384m"

Jakub Háva
  • 228
  • 1
  • 5