4

This is an extension of a previous question I asked: How to compare two columns with different data type groups

We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to partitions (they are daily)? Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.

Question: Is it possible to change partition meta data in HIVE? If yes, how?

J Weezy
  • 3,507
  • 3
  • 32
  • 88

2 Answers2

5

You can change partition column type using this statement:

alter table {table_name} partition column ({column_name} {column_type});

Also you can re-create table definition and change all columns types using these steps:

  1. Make your table external, so it can be dropped without dropping the data

    ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');

  2. Drop table (only metadata will be removed).

  3. Create EXTERNAL table using updated DDL with types changed and with the same LOCATION.

  4. recover partitions:

    MSCK [REPAIR] TABLE tablename;

The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is:

ALTER TABLE tablename RECOVER PARTITIONS;

This will add Hive partitions metadata. See manual here: RECOVER PARTITIONS

  1. And finally you can make you table MANAGED again if necessary:

ALTER TABLE tablename SET TBLPROPERTIES('EXTERNAL'='FALSE');

Note: All commands above should be ran in HUE, not MySQL.

Community
  • 1
  • 1
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Can you confirm the syntax on the ALTER TABLE command at the top of your post? I am getting a `syntax error near 'COLUMN'. Expected BY.` – J Weezy Oct 09 '19 at 23:00
  • Also, we are only trying to change the metadata of a single column on the table. Will this work on a column that is not the partitioning column? Are there any specific rules around that? – J Weezy Oct 09 '19 at 23:12
  • 2
    @JWeezy tested it: 1. `create table test_partition_col(c1 int) partitioned by (ts timestamp);` 2. `alter table test_partition_col partition column (ts string)` See here: https://demo.gethue.com/hue/editor?editor=52825&type=hive – leftjoin Oct 10 '19 at 06:47
  • 1
    @JWeezy Also have a look at this Jira: https://issues.apache.org/jira/browse/HIVE-3672 – leftjoin Oct 10 '19 at 06:48
  • 1
    @JWeezy This is how to change column type syntax: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn – leftjoin Oct 10 '19 at 06:50
  • If I read your `create` and `alter` commands correctly, you are modifying the metadata on the table's partition column. What I am attempting to do is modify the metadata of a column that is on a partitioned table, but `is not` the partition column. – J Weezy Oct 10 '19 at 19:47
  • I have posted an add-on question here for changing non-partitioning column metadata. https://stackoverflow.com/questions/58333567/is-it-possible-to-change-the-metadata-of-a-column-that-is-on-a-partitioned-table – J Weezy Oct 11 '19 at 03:52
  • I also have this: https://stackoverflow.com/questions/58334298/hive-error-when-changing-column-metadata-on-partitioned-table-avrotypeexception – J Weezy Oct 11 '19 at 07:13
2

You can not change the partition column in hive infact Hive does not support alterting of partitioning columns

Refer : altering partition column type in Hive

You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible exp if you have partitioned on year this is how directory structure looks like

tab1/clientdata/2009/file2
tab1/clientdata/2010/file3

If you want to change the partition column you can perform below steps

  1. Create another hive table with required changes in partition column

    Create table new_table ( A int, B String.....)

  2. Load data from previous table

    Insert into new_table partition ( B ) select A,B from table Prev_table

Strick
  • 1,512
  • 9
  • 15
  • Are you sure the information that you provided is current and accurate? Per the link in your post, it was created in 2012. – J Weezy Oct 09 '19 at 14:25
  • Found this on Internet https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterPartition But still it is not very clear if we can change the type of column also.. Few time back also i tried alter partition statement but it got failed – Strick Oct 09 '19 at 15:14
  • look at this: demo.gethue.com/hue/editor?editor=52825&type=hive – leftjoin Oct 10 '19 at 14:26
  • 1
    And also this: https://issues.apache.org/jira/browse/HIVE-3672 – leftjoin Oct 10 '19 at 14:27
  • @leftjoin By any chance, have you had a chance to write data to multiple partitions, change the metadata, run the steps you listed above, and then successfully ran a query that uses a `reduce` (e.g., SELECT DISTINCT, GROUP BY, WHERE, etc.)? I have tried this without success. It appears that I need to write the data to a new table using a CAST operation - it is the only thing that is working right now. Manual inspection of the partition file's header ,where the AVRO schema is also stored, confirm this. – J Weezy Oct 17 '19 at 13:21