SPARK : failure: ``union'' expected but `(' found

Question

I have a dataframe called df with column named employee_id. I am doing:

 df.registerTempTable("d_f")
val query = """SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f"""
val result = Spark.getSqlContext().sql(query)

But getting following issue. Any help?

[1.29] failure: ``union'' expected but `(' found
SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f
                            ^
java.lang.RuntimeException: [1.29] failure: ``union'' expected but `(' found
SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f

`SELECT t.*, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f as t` — Praveen, Aug 03 '15 at 12:16
Query is fine. You are getting error in some other part and not here. Post the complete query. — Rahul, Aug 03 '15 at 12:26
@Praveen, I haven't tested but logically it shouldn't be the case. Since the query involves only one table aliasing is not necessary at all; in other words in no way DB engine would get ambiguity about the references. — Rahul, Aug 03 '15 at 12:45

score 17 · Answer 1 · edited Mar 15 '16 at 15:12

17

Spark 2.0+

Spark 2.0 introduces native implementation of window functions (SPARK-8641) so HiveContext should be no longer required. Nevertheless similar errors, not related to window functions, can be still attributed to the differences between SQL parsers.

Spark <= 1.6

Window functions have been introduced in Spark 1.4.0 and require HiveContext to work. SQLContext won't work here.

Be sure you you use Spark >= 1.4.0 and create the HiveContext:

import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)

edited Mar 15 '16 at 15:12

Daniel Darabos

26,991
10
102
114

answered Aug 03 '15 at 12:38

zero323

322,348
103
959
935

`sc` is SparkContext. – Daniel Darabos Mar 15 '16 at 14:41
But why do window functions need a HiveContext? What is the difference between HiveContext and SQLContext? – Daniel Darabos Mar 15 '16 at 14:42
2

@DanielDarabos In this particular case it is simply about the support for Hive UDAFs. All window functions in Spark < 2.0.0 are expressed using Hive UDAF, hence cannot work without HiveContext. – zero323 Mar 15 '16 at 14:58
1

I see, thanks! Thanks for updating the answer too. I've added a link to https://issues.apache.org/jira/browse/SPARK-8641. – Daniel Darabos Mar 15 '16 at 15:13

score 1 · Answer 2 · answered Jul 07 '16 at 10:22

Yes It is true,

I am using spark version 1.6.0 and there you need a HiveContext to implement 'dense_rank' method.

From Spark 2.0.0 on words there will be no more 'dense_rank' method.

So for Spark 1.4,1.6 <2.0 you should apply like this.

table hive_employees having three fields :: place : String, name : String, salary : Int

val conf = new SparkConf().setAppName("denseRank test")//.setMaster("local")

val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val hqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

val result = hqlContext.sql("select empid,empname, dense_rank() over(partition by empsalary order by empname) as rank from hive_employees")

result.show()

SPARK : failure: ``union'' expected but `(' found

2 Answers2

Linked

Related