how to apply Windows function in HiveQL in spark

Question

I have seen posts discussing the usage of windows function. But i have some questions.

Since it is can only be used in HiveContext. How can i switch between SparkSQLContext and HiveContext given i am already using SparkSQLContext?

How is that possible to run a HiveQL using windows function here? I tried

df.registerTempTable("data")
from pyspark.sql import functions as F
from pyspark.sql import Window

%%hive
SELECT col1, col2, F.rank() OVER (Window.partitionBy("col1").orderBy("col3") 
FROM data

and native Hive SQL

SELECT col1, col2, RANK() OVER (PARTITION BY col1 ORDER BY col3) FROM data

but neither of them works.

score 0 · Accepted Answer · answered Mar 02 '16 at 02:35

0

How can i switch between SparkSQLContext and HiveContext given i am already using SparkSQLContext?

You cannot. Spark data frames and tables are bound to a specific context. If you want to use HiveContext then use it all the way. You drag all the dependencies anyway.

How is that possible to run a HiveQL using windows function here

sqlContext = ...  # HiveContext 
sqlContext.sql(query)

The first query you use is simply invalid. The second one should work if you use correct context and configuration.

answered Mar 02 '16 at 02:35

zero323

322,348
103
959
935

Thanks! I am using HDInsight Spark and their kernel will launch HiveContext and SparkContext. So i thought i could run the Windows function over there. Also, is that possible to run HiveQL on top of a spark dataframe? – MYjx Mar 02 '16 at 17:21
Not exactly. You can register table and use `sqlContext.sql` but Spark is not fully compatible with HiveQL. Usually it shouldn't matter though. – zero323 Mar 02 '16 at 17:23
Thanks! but is there anyway to actually run similar windows function in sqlcontext as well? – MYjx Mar 02 '16 at 17:26
No, as for now there isn't – zero323 Mar 02 '16 at 17:32

how to apply Windows function in HiveQL in spark

1 Answers1