0

I have an employee hive table with column Name , Department , City and i want to retrieve the data based on names of the employee using IN operation in HiveContext.sql() function but it is throwing pyspark.Analysis Exception.please look at example below.

employee table:

Name   Department  City
Ram     FDE        Mumbai
Ramesh  CTZ        Pune
Suraj   FDE        Chennai
Varun   CTZ        Delhi

Query :

SELECT * from employee WHERE Name in ('Ramesh' , 'Varun')

code snippet from spark program:

namesList= ['Ramesh' , 'Varun']
data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = namesList))

I tried to modify and pass the strings instead of list but the error remains same

Error:pyspark.AnalysisException : structType field

please help me on this and suggest me if i'm doing something wrong here.

Ramesh Raj
  • 185
  • 1
  • 2
  • 13

2 Answers2

1

You should get rid of square brackets from the python list when creating the query - str(namesList)[1:-1]

data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = str(namesList)[1:-1]))
matkurek
  • 553
  • 5
  • 12
1

Replace this

data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = namesList))

with this

data = HiveContext.sql("SELECT * from employee WHERE Name in ({namesList})".format(namesList = "'"+"','".join(namesList)+"'"))

You need to pass the string not list.

Shubham Jain
  • 5,327
  • 2
  • 15
  • 38