I want to write ORC data into an external Hive table from the Spark data frame. When I save the data frame as a table the data is sent to existing external table, however, when I try to save the data in ORC format into the directory and then read this data from the external table, it is not displayed.
What could be the reason for the data absences in the second case?
How it works:
val dataDir = "/tmp/avro_data"
sql("CREATE EXTERNAL TABLE avro_random(name string, age int, phone string, city string, country string) STORED AS ORC LOCATION '$dataDir'")
dataframe
.write
.mode(SaveMode.Overwrite)
.saveAsTable("avro_random")
sql("SELECT * FROM avro_random").show()
The code that returns empty external table:
val dataDir = "/tmp/avro_data"
sql("CREATE EXTERNAL TABLE avro_random(name string, age int, phone string, city string, country string) STORED AS ORC LOCATION '$dataDir'")
dataframe
.write
.mode(SaveMode.Overwrite)
.orc(dataDir)
sql("SELECT * FROM avro_random").show()