1

I'm trying to load the data from Oracle to Databricks but I encountered a Unicode character issue in PySpark it can't encode the Unicode character as per the format available in Oracle in fact it displays the replacement character as '▯'.In oracle the NLS_NCHAR_NCHARACTERSET=AL16UTF16.

I tried the Inserting national characters into an oracle NCHAR or NVARCHAR column does not work Oracle JDBC system property but it doesn't work in my case. I request you to please provide an alternative to fix this issue.

Muhammad Mohsin Khan
  • 1,444
  • 7
  • 16
  • 23
Ahnvi
  • 71
  • 1
  • 6

1 Answers1

0

option 1 - Use a jdbc option when you read data with spark:

spark.read.format("jdbc")...option("useUnicode", true).option("characterEncoding", "UTF-16")

option 2 - Use proper connection string:

url = "...?useUnicode=true&characterEncoding=UTF-16"
spark.read.format("jdbc").option("url", url)
YuriR
  • 1,251
  • 3
  • 14
  • 26
  • Hi @YuriR it works for SQL but not for Oracle I tried both the options. Can you suggest any alternative please. – Ahnvi Jan 10 '22 at 10:31
  • @Ahnvi is your table defined with utf-16 support? Try to query your db with pure python or java and see if you get correct characters. – YuriR Jan 10 '22 at 10:48