Spark SQL Stackoverflow

Question

I'm a newbie on spark and spark sql and I was trying to make the example that is on Spark SQL website, just a simple SQL query after loading the schema and data from a JSON files directory, like this:

import sqlContext.createSchemaRDD
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val path = "/home/shaza90/Desktop/tweets_1428981780000"
val tweet = sqlContext.jsonFile(path).cache()

tweet.registerTempTable("tweet")
tweet.printSchema() //This one works fine


val texts = sqlContext.sql("SELECT tweet.text FROM tweet").collect().foreach(println)

The exception that I'm getting is this one:

java.lang.StackOverflowError

    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)

Update

I'm able to execute select * from tweet but whenever I use a column name instead of * I get the error.

Any Advice?

Sounds like this (unresolved) bug: https://issues.apache.org/jira/browse/SPARK-4208 — stholzm, May 02 '15 at 07:21
It is always worth trying the latest version. Also, the bug reads "Affects Version/s: 1.1.0" - it is possible that it disappeared in later versions. — stholzm, May 02 '15 at 18:06
Yes you should be using the 1.3.x version as Spark SQL main abstraction (SchemaRDD) changed a lot to DataFrame — Olivier Girardot, May 02 '15 at 20:46

score 11 · Accepted Answer · answered May 06 '15 at 06:36

11

This is SPARK-5009 and has been fixed in Apache Spark 1.3.0.

The issue was that to recognize keywords (like SELECT) with any case, all possible uppercase/lowercase combinations (like seLeCT) were generated in a recursive function. This recursion would lead to the StackOverflowError you're seeing, if the keyword was long enough and the stack size small enough. (This suggests that if upgrading to Apache Spark 1.3.0 or later is not an option, you can use -Xss to increase the JVM stack size as a workaround.)

answered May 06 '15 at 06:36

Daniel Darabos

26,991
10
102
114

What is -Xss? How can I change it? – Lisa May 06 '15 at 07:02
2

It's thanks to stholzm for finding SPARK-4208 and Sean Owen for closing it as a duplicate of SPARK-5009 less than an hour ago :). `-Xss` is a `java` command line flag. For example `-Xss4M` will set stack size to 4 MB (I guess that should be enough). If you start the Spark application with `spark-submit`, I think you need to use the `--conf` flag to pass this other flag. See http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job. – Daniel Darabos May 06 '15 at 07:16

Spark SQL Stackoverflow

1 Answers1