0

I want to read the data(from the Hadoop database) which has characters other than ACSII characters. I am trying to read the data using .py file. I have used

#!/usr/bin/env python
# -*- coding: utf-8 -*-

to specify the encoding.

I have used below query to pull the data.

def hiveconnection(host_name, port, user, database):
    conn = hive.Connection(host=host_name, port=port, username=user, database=database, auth='KERBEROS', kerberos_service_name='impala')
    cur = conn.cursor()
    cur.execute(" select * from db_name.table_name ")
    result = cur.fetchall()
    return result
output = hiveconnection(host_name, port, user, database)
denialt2= pd.DataFrame(output) 

I had got the error message. Error message: " 'utf-8' codec can't decode byte 0x96 in position 13: invalid start byte". On investigating the error message, I got know that it is throwing error message because of special character other than ASCII character. Pasted the special character below from the one of the columns.

enter image description here

Attaching the complete traceback(error message).

enter image description here

Please help me to resolve the issue. Thanks in Advance:).

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • What the error message is telling you is _something_ is trying to decode bytes as UTF-8, but the bytes are not valid UTF-8. If you were to [edit] the question to include the _complete_ traceback we might be able to determine which part is expecting UTF-8. – snakecharmerb Mar 29 '21 at 17:13
  • Thank you @snakecharmerb for your time and the response to my request. I had included the complete traceback. Please let me know your thoughts/solution on this. Thanks Again! – Raghavendra S Mar 30 '21 at 07:08
  • It looks as if the pyHive connection expects that data from hive is encoded as UTF-8, but apparently this is not true. Probably you need to check what encoding hive is using (and that the data has been properly encoded in the first place) and perhaps set an encoding for the connection. I don't know how you would do that I'm afraid. – snakecharmerb Mar 30 '21 at 08:21
  • I was searching on how to set an encoding while establishing the connection. But I could not find that. Do we have ways to do that?. Also do we have methods/procedure to check the encoding used by hive. Thank you. – Raghavendra S Mar 30 '21 at 10:57

0 Answers0