3

I have an external table pointing to an s3 location (parquet file) which has all the datatypes as string. I want to correct the datatypes of all the columns instead of just reading everything as a string. when i drop the external table and recreate with new datatypes, the select query always throws error which looks something like below:

java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
    at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:48)
    at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:233)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
Omar Einea
  • 2,478
  • 7
  • 23
  • 35
querymind
  • 31
  • 1
  • 2
  • Please share your 'create table' syntax and the query which leads to the above error – Satya Pavan Mar 31 '18 at 22:24
  • @querymind I think when you use external table you can't change the datatype of your column to something else if this one is originaly String – HISI Apr 01 '18 at 14:37
  • @SatyaPavan My original table is something like this:` CREATE EXTERNAL TABLE `test_database`.`test_table`(`id` string, `place` string, `total` string) COMMENT 'This is a test table. ' PARTITIONED BY (`date` string) STORED AS PARQUET LOCATION 's3 location' ` I dropped this table and created new one like: ` CREATE EXTERNAL TABLE `test_database`.`test_table`(`id` bigint, `place` string, `total` bigint) COMMENT 'This is a test table. ' PARTITIONED BY (`date` date) STORED AS PARQUET LOCATION 's3 location' ` – querymind Apr 01 '18 at 14:41
  • @querymind Try keep datatype of original external table, and then try to change it to the datatype you need by: `ALTER TABLE table CHANGE col col type;` I think that should work – HISI Apr 01 '18 at 14:45
  • @hisi that doesn't work i get an error . `org.apache.spark.sql.AnalysisException: ALTER TABLE CHANGE COLUMN is not supported for changing column 'id' with type 'StringType' to 'id' with type 'LongType'; ` i am not sure if the alter table command works different for partitioned table – querymind Apr 01 '18 at 16:39
  • @querymind you maybe need to create a managed table from your externel table: `create table tablename as select * from yourtable` , and then you can run alter, try this one – HISI Apr 01 '18 at 16:53
  • Have you checked this link - [https://issues.apache.org/jira/browse/SPARK-17557] – Satya Pavan Apr 02 '18 at 03:16
  • I am facing this problem. Has anyone figured out a solution? – Gladiator Mar 16 '21 at 14:38

1 Answers1

-1

Specify type as BigInt which is Equivalent to long type,hive don't have long datatype.

hive> alter table table change col col bigint;

Duplicate content, from Hortonworks forum

HISI
  • 4,557
  • 4
  • 35
  • 51