Spark HiveContext apply IN operation using sql method

Question

I have an employee hive table with column Name , Department , City and i want to retrieve the data based on names of the employee using IN operation in HiveContext.sql() function but it is throwing pyspark.Analysis Exception.please look at example below.

employee table:

Name   Department  City
Ram     FDE        Mumbai
Ramesh  CTZ        Pune
Suraj   FDE        Chennai
Varun   CTZ        Delhi

Query :

SELECT * from employee WHERE Name in ('Ramesh' , 'Varun')

code snippet from spark program:

namesList= ['Ramesh' , 'Varun']
data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = namesList))

I tried to modify and pass the strings instead of list but the error remains same

Error:pyspark.AnalysisException : structType field

please help me on this and suggest me if i'm doing something wrong here.

score 1 · Answer 1 · answered May 27 '20 at 09:20

1

You should get rid of square brackets from the python list when creating the query - str(namesList)[1:-1]

data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = str(namesList)[1:-1]))

answered May 27 '20 at 09:20

matkurek

553
5
12

score 1 · Answer 2 · answered May 27 '20 at 10:06

Replace this

data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = namesList))

with this

data = HiveContext.sql("SELECT * from employee WHERE Name in ({namesList})".format(namesList = "'"+"','".join(namesList)+"'"))

You need to pass the string not list.

Spark HiveContext apply IN operation using sql method

2 Answers2