13

I'm a newbie on spark and spark sql and I was trying to make the example that is on Spark SQL website, just a simple SQL query after loading the schema and data from a JSON files directory, like this:

import sqlContext.createSchemaRDD
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val path = "/home/shaza90/Desktop/tweets_1428981780000"
val tweet = sqlContext.jsonFile(path).cache()

tweet.registerTempTable("tweet")
tweet.printSchema() //This one works fine


val texts = sqlContext.sql("SELECT tweet.text FROM tweet").collect().foreach(println) 

The exception that I'm getting is this one:

java.lang.StackOverflowError

    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)

Update

I'm able to execute select * from tweet but whenever I use a column name instead of * I get the error.

Any Advice?

Lisa
  • 3,121
  • 15
  • 53
  • 85

1 Answers1

11

This is SPARK-5009 and has been fixed in Apache Spark 1.3.0.

The issue was that to recognize keywords (like SELECT) with any case, all possible uppercase/lowercase combinations (like seLeCT) were generated in a recursive function. This recursion would lead to the StackOverflowError you're seeing, if the keyword was long enough and the stack size small enough. (This suggests that if upgrading to Apache Spark 1.3.0 or later is not an option, you can use -Xss to increase the JVM stack size as a workaround.)

Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
  • What is -Xss? How can I change it? – Lisa May 06 '15 at 07:02
  • 2
    It's thanks to stholzm for finding SPARK-4208 and Sean Owen for closing it as a duplicate of SPARK-5009 less than an hour ago :). `-Xss` is a `java` command line flag. For example `-Xss4M` will set stack size to 4 MB (I guess that should be enough). If you start the Spark application with `spark-submit`, I think you need to use the `--conf` flag to pass this other flag. See http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job. – Daniel Darabos May 06 '15 at 07:16