6

I am trying to use Zeppelin with the following code:

val dataText = sc.parallelize(IOUtils.toString(new URL("http://XXX.XX.XXX.121:8090/my_data.txt"),Charset.forName("utf8")).split("\n"))


case class Data(id: string, time: long, value1: Double, value2: int, mode: int)
val dat = dataText .map(s => s.split("\t")).filter(s => s(0) != "Header:").map(
    s => Data(s(0), 
            s(1).toLong,
            s(2).toDouble,
            s(3).toInt,
            s(4).toInt
        )
).toDF()
dat.registerTempTable("mydatatable")

this keeps throwing me following error :

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
    at java.lang.StringBuilder.append(StringBuilder.java:204)
    at org.apache.commons.io.output.StringBuilderWriter.write(StringBuilderWriter.java:138)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2002)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1980)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:1957)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:1907)
    at org.apache.commons.io.IOUtils.toString(IOUtils.java:778)
    at org.apache.commons.io.IOUtils.toString(IOUtils.java:896)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
    at $iwC$$iwC$$iwC.<init>(<console>:51)
    at $iwC$$iwC.<init>(<console>:53)
    at $iwC.<init>(<console>:55)
    at <init>(<console>:57)
    at .<init>(<console>:61)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)

I have already set the following in the zeppelin-env.sh

export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557 -Dspark.executor.memory=4g"

any idea what I may be missing. File I am parsing my_data.txt is about 200MB

BTW I am using the Hortonworks Sandbox if that matters

EDIT 1 Here is my zeppelin-env.sh

export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_PORT=9995
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557 -Dspark.executor.memory=4g"
export SPARK_SUBMIT_OPTIONS="--driver-java-options -Xmx4g"
export ZEPPELIN_INT_MEM="-Xmx4g"
export SPARK_HOME=/usr/hdp/2.3.0.0-2557/spark

Regards Kiran

Kiran
  • 2,997
  • 6
  • 31
  • 62

4 Answers4

4

Can you try increasing the memory in SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh:

export SPARK_SUBMIT_OPTIONS="--driver-java-options -Xmx20g"

This thread may help http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Can-not-configure-driver-memory-size-td1513.html

Derlin
  • 9,572
  • 2
  • 32
  • 53
sag
  • 5,333
  • 8
  • 54
  • 91
  • Thanks, but this is still not helping me. I have not set JAVA_HOME but does that matter? – Kiran Jan 12 '16 at 04:03
  • @Kiran - AFAIK, JAVA_HOME is not needed. Is OOM from zeppelin or from Apache spark? – sag Jan 12 '16 at 04:35
  • I thinik its from spark, `SparkIMain.scala:1338` i have updated the complete stack above. Thanks – Kiran Jan 12 '16 at 05:02
  • As it is from Spark, SPARK_SUBMIT_OPTIONS should help. Did you set SPARK_HOME? Or are you using internal Spark built with zeppelin? Or are you using spark cluster by setting "master"? – sag Jan 12 '16 at 05:14
  • Just tried even that, same error but at `at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)` I have also updated my `zeppelin-env.sh` configurations, do you see anything wrong in there? Thanks – Kiran Jan 12 '16 at 05:36
  • Sorry I am out of idea what is wrong over here. For me everything seems to be fine. – sag Jan 12 '16 at 06:08
0

Increasing the memory for the following zeppelin-env.sh var, did the trick for me. The default is 1/0.5GB, I increased it to 10/5GB

ZEPPELIN_MEM": "-Xmx10024m -XX:MaxPermSize=5120m
dimisjim
  • 368
  • 2
  • 19
0

I was getting below error while trying to bring up Zeppelin notebook

INFO [2021-05-04 15:16:22,015] ({main} Folder.java[addNote]:185) - Add note 2G7CAFXX7 to folder /
INFO [2021-05-04 15:16:22,016] ({main} Notebook.java[<init>]:127) - Notebook indexing started...
WARN [2021-05-04 15:16:32,045] ({main} ContextHandler.java[log]:2355) - unavailable
MultiException stack 1 of 1
java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:80)
        at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:53)

To resolve this issue I tuned the ZEPPELIN_MEM parameter in zeppelin-env.sh file like this,

export ZEPPELIN_MEM="-Xmx5024m -XX:MaxPermSize=5120m"

Then restart zeppelin

sudo systemctl stop zeppelin; sudo systemctl start zeppelin

Result

INFO [2021-05-04 18:51:02,939] ({main} Folder.java[addNote]:185) - Add note 2G7CAFXX7 to folder /
INFO [2021-05-04 18:51:02,940] ({main} Notebook.java[<init>]:127) - Notebook indexing started...
INFO [2021-05-04 18:51:05,793] ({main} LuceneSearch.java[addIndexDocs]:305) - Indexing 905 notebooks took 2853ms
INFO [2021-05-04 18:51:05,793] ({main} Notebook.java[<init>]:129) - Notebook indexing finished: 905 indexed in -2s
INFO [2021-05-04 18:51:05,795] ({main} Helium.java[loadConf]:103) - Add helium local registry /usr/lib/zeppelin/helium
INFO [2021-05-04 18:51:05,797] ({main} Helium.java[loadConf]:100) - Add helium
INFO [2021-05-04 18:51:06,631] ({main} Server.java[doStart]:407) - Started @131632ms
INFO [2021-05-04 18:51:06,631] ({main} ZeppelinServer.java[main]:249) - Done, zeppelin server started
nikhilvkn
  • 39
  • 1
  • 4
-1

The only thing that worked for me (using Spark 2) was to add to conf/zeppelin-env.sh:

export SPARK_SUBMIT_OPTIONS="... --driver-memory 4g ..."

Then restart Zeppelin interpreter (In the Zeppelin for Spark 2, click on the settings button on the top right and then click on the Interpreter link, scroll down and click the Restart button of the Spark section).

Derlin
  • 9,572
  • 2
  • 32
  • 53
Alex
  • 11