error when spark sql read parquet table with text partition

Question

1、Background:

I have a hive external table A, which was created in text format when it was created. The HDFS data of the partition is also text+gz.

Table A is used by thousands of sql.files. All the 5-year historical partitions of Table A may be used.

Currently we have a better storage format parquet. To reduce switching costs I plan to change table A to a parquet table, with parquet+gz data for the new partition and text+gz data for the old partition. The business can read any partition of table A through sparksql and hivesql.

2、Verification process:

2.1、create Table enter image description here

2.2、add partition

20210702 path is text+gz

20210703 path is parquet+gz

enter image description here

3、Error: enter image description here

4、expect:

Are there some solutions, such as parameter configuration, that can solve this problem.

What I have done: https://issues.apache.org/jira/browse/SPARK-24965 According to the stack information reported in the error, I have not seen sparksql in the source code about the hive table metadata and partition metadata.

5、Configuration Environment: hdp2.7.3 sparksql2.3 hive1.2

error when spark sql read parquet table with text partition

0 Answers0