1

There is already a question on Hive in general ( Is there a way to alter column type in hive table?). The answer to this question states that it is possible to change the schema with the alter table change command

However, is this also possible if the file is stored as ORC?

Community
  • 1
  • 1
Stefan Papp
  • 2,199
  • 1
  • 28
  • 54

2 Answers2

1

You can load the orc file into pyspark:

  1. Load data into a dataframe:

    df = spark.read.format("orc").load("<path-of-file-in-hdfs")
    
  2. Create a view over the dataframe:

    df2 = df.createOrReplaceTempView('Table')
    
  3. Create a new data frame with manipulated columns:

    df3 = spark.sql("select *, cast(third_column as float) as third_column,  from Table")
    
  4. Save the dataframe to hdfs:

    df3.write.format("orc").save("<hdfs-path-where-file-needs-to-be-saved")
    
fcdt
  • 2,371
  • 5
  • 14
  • 26
kushagra deep
  • 462
  • 6
  • 12
0

I ran tests on a ORC-table. It is possible to convert a string to a float column.

ALTER TABLE test_orc CHANGE third_column third_column float;

would convert a column called third_column that is marked as a string column to a float column. It is also possible to change the name of a column.

Sidenote: I was curious if other alterations on ORC might create problems. I ran into an exception when I tried to reorder columns.

ALTER TABLE test_orc CHANGE third_column third_column float AFTER first_column;

The exception is: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Reordering columns is not supported for table default.test_orc. SerDe may be incompatible.

Stefan Papp
  • 2,199
  • 1
  • 28
  • 54
  • I'm sorry... How did you try to reorder columns? And what is the use in reordering columns? You keep this exception in secret? – leftjoin Nov 30 '16 at 13:20
  • Ok, thank you for your comment. Yes, it makes sense to add the exception information. I added the missing information – Stefan Papp Dec 12 '16 at 17:04
  • Could you please tell why do you need to reorder columns? – leftjoin Dec 12 '16 at 20:03
  • It was just a try out. There was no apparent reason. I wanted to know if I there are some other limitations than maybe changing the schema – Stefan Papp Dec 13 '16 at 06:10