10

I have a dataframe called df with column named employee_id. I am doing:

 df.registerTempTable("d_f")
val query = """SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f"""
val result = Spark.getSqlContext().sql(query)

But getting following issue. Any help?

[1.29] failure: ``union'' expected but `(' found
SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f
                            ^
java.lang.RuntimeException: [1.29] failure: ``union'' expected but `(' found
SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f
zero323
  • 322,348
  • 103
  • 959
  • 935
user1735076
  • 3,225
  • 7
  • 19
  • 16
  • does the query work if run directly in Db? – Bulat Aug 03 '15 at 12:14
  • `SELECT t.*, ROW_NUMBER() OVER (ORDER BY employee_id) row_number FROM d_f as t` – Praveen Aug 03 '15 at 12:16
  • 1
    Query is fine. You are getting error in some other part and not here. Post the complete query. – Rahul Aug 03 '15 at 12:26
  • @Praveen, how aliasing the table would a solution here? – Rahul Aug 03 '15 at 12:27
  • @Praveen, I haven't tested but logically it shouldn't be the case. Since the query involves only one table aliasing is not necessary at all; in other words in no way DB engine would get ambiguity about the references. – Rahul Aug 03 '15 at 12:45

2 Answers2

17

Spark 2.0+

Spark 2.0 introduces native implementation of window functions (SPARK-8641) so HiveContext should be no longer required. Nevertheless similar errors, not related to window functions, can be still attributed to the differences between SQL parsers.

Spark <= 1.6

Window functions have been introduced in Spark 1.4.0 and require HiveContext to work. SQLContext won't work here.

Be sure you you use Spark >= 1.4.0 and create the HiveContext:

import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
zero323
  • 322,348
  • 103
  • 959
  • 935
1

Yes It is true,

I am using spark version 1.6.0 and there you need a HiveContext to implement 'dense_rank' method.

From Spark 2.0.0 on words there will be no more 'dense_rank' method.

So for Spark 1.4,1.6 <2.0 you should apply like this.

table hive_employees having three fields :: place : String, name : String, salary : Int

val conf = new SparkConf().setAppName("denseRank test")//.setMaster("local")

val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val hqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 

val result = hqlContext.sql("select empid,empname, dense_rank() over(partition by empsalary order by empname) as rank from hive_employees")

result.show()

Pelab
  • 61
  • 10