0

Spark version: 1.6.3

I running Spark thrift server as proxy. But it not running as long as I expected. It always stop when get high load.

This is Error when I access.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /jobs/. Reason:
<pre>    Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOfRange(Arrays.java:3664)
 at java.lang.String.<init>(String.java:207)
 at java.lang.StringBuilder.toString(StringBuilder.java:407)
 at scala.collection.mutable.StringBuilder.toString(StringBuilder.scala:427)
 at scala.xml.Node.buildString(Node.scala:161)
 at scala.xml.Node.toString(Node.scala:166)
 at org.apache.spark.ui.JettyUtils$$anonfun$htmlResponderToServlet$1.apply(JettyUtils.scala:55)
 at org.apache.spark.ui.JettyUtils$$anonfun$htmlResponderToServlet$1.apply(JettyUtils.scala:55)
 at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:83)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
 at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
 at org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1507)
 at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:179)
 at org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1478)
 at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
 at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
 at org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:427)
 at org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
 at org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
 at org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.spark-project.jetty.server.Server.handle(Server.java:370)
 at org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
 at org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:973)
 at org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1035)
 at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:641)
 at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:231)
 at org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
 at org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
 at org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
</pre>
<hr /><i><small>Powered by Jetty://</small></i><br/>                                                
</body>
</html>

I see error is java.lang.OutOfMemoryError: Java heap space

But I don't know what memory I need increase:

  • Memory of server running SPARK
  • Memory of executor configure with SPARK
  • other Memory configure...

Update: my SPARK configure

my SPARK configure

2 Answers2

0

The thrift server runs in it's own memory space. You can specify that via the executor memory property.

For example

./sbin/start-thriftserver.sh --executor-memory 512m

I think the default is 1g, as the spark-env.sh has.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • you mean I need increase spark.executor.memory. I update my configure on my post. I use 4G for spark.executor.memory properties – Mercury Trivival Dec 05 '17 at 05:47
  • In HDP, I believe Ambari groups everything under "Spark daemon memory" – OneCricketeer Dec 05 '17 at 06:36
  • so I will try increase SPARK_DAEMON_MEMORY. But is there any calculation to know my configuration fits my data ? – Mercury Trivival Dec 05 '17 at 07:58
  • Not if you keep reaching the limits... You need to keep increasing until your process stays up. You can try enabling JMX monitoring, but even Java website gives that advice – OneCricketeer Dec 05 '17 at 08:41
  • I increase memory and spark-thrift server can keep running about two days, then after that it stops and I have to restart the service. It always stops after two days running – Mercury Trivival Dec 20 '17 at 06:13
  • Sounds like the garbage collection isn't working correctly then. Like I said, see if you can get JMX monitoring on the service, then watch the JVM memory – OneCricketeer Dec 20 '17 at 06:17
  • How I can monitor JMX, any built-in tool or I need install some other tools. – Mercury Trivival Dec 20 '17 at 06:22
  • I don't have any examples, but the documentation is here https://spark.apache.org/docs/latest/monitoring.html#metrics and most people I've seen use Jconsole (comes with the JDK) to attach to a running JVM to look at data points, but you'll want a graph, and in that case, Grafana with Prometheus seems to be the popular option nowadays – OneCricketeer Dec 20 '17 at 06:31
  • Maybe something like this, but skipping over the Cassandra stuff https://www.supergloo.com/fieldnotes/spark-performance-monitoring/ – OneCricketeer Dec 20 '17 at 06:36
0

Based on the number of nodes in the cluster the Spark thrift Server can be brought up for( e.g., on AWS EMR) you can use with 1 small r3.2xlarge node

sudo /usr/lib/spark/sbin/start-thriftserver.sh --driver-memory 10g --verbose --master yarn-client --executor-memory 15g --num-executors 6 --executor-cores 2

Also ensure monitoring or cleaning up the Spark History folder - hdfs dfs -ls /var/log/spark/apps/ -- on EMR

Queries will be stuck if multiple jobs are in .inprogress state in the folder

there is a way to set up automated cleanup of Spark history folder

Jules Dupont
  • 7,259
  • 7
  • 39
  • 39
Saurav Bhowmick
  • 308
  • 4
  • 16