Is there a way in Spark SQL to do mergeschema option for parquet file?

Asked Mar 29 '23 at 20:50

Active Mar 29 '23 at 20:50

Viewed 162 times

I have a parquet table for which I get an error:

FileReadException: Error while reading file dbfs:/mnt/gold/catalog.parquet/part-00120-tid-1146522170304013652-7e167102-3a27-46d7-b674-901496f37d84-353-1-c000.snappy.parquet. 
Parquet column cannot be converted. Column: [CreateDate], **Expected: StringType, Found: INT32**

I can read table using pyspark:

df_catalog = spark.read.option("mergeSchema", "true").parquet(catalog_path)

I would like to enable users to access tables using simply Spark SQL. Is it possible to create table with this option?

asked Mar 29 '23 at 20:50

DejanS

Short [answers](https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#hiveparquet-schema-reconciliation). Long answer, make sure u have a table with correct position between columns in parquet files and tables what u are created. An example, create table a( a string, b int) while schema parquet is (b int, a string) that could rise error what u have – Роберт Надь Mar 29 '23 at 21:24
But this is happening during read, not during write. – DejanS Mar 30 '23 at 18:17
I tried doing SET spark.sql.parquet.int96AsTimestamp=true before I do select * from catalog; but I am having same problem. – DejanS Mar 30 '23 at 18:23
I also checked if the col has data from something else (after I read it with mergeSchema). All rows contain dates. – DejanS Mar 30 '23 at 18:25

Is there a way in Spark SQL to do mergeschema option for parquet file?

0 Answers0