Suppose the next table:
sch_test.test_cascade
a b c
-----------
6 10 1
6 10 1
6 10 2
6 10 2
a, b and c are integers and the table is partitioned by c. When I add a column in Hive with CASCADE:
ALTER TABLE sch_test.test_cascade ADD COLUMNS (d int, e int) CASCADE;
it returns the next table:
sch_test.test_cascade
a b d e c
-------------------------
6 10 1 NULL 1
6 10 1 NULL 1
6 10 2 NULL 2
6 10 2 NULL 2
That is, the first inserted column (d, in this case) takes the values of the partition column (Note that I've added two columns to show that the second one is NULL). But, if I add a column without CASCADE:
ALTER TABLE sch_test.test_cascade ADD COLUMNS (d int, e int);
it returns the next table:
sch_test.test_cascade
a b d e c
----------------------------
6 10 NULL NULL 1
6 10 NULL NULL 1
6 10 NULL NULL 2
6 10 NULL NULL 2
That is, both inserted columns are NULL.
I don't understand very well the difference between add a column with CASCADE or RESTRICT, in Hive documentation I can see:
The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata.
But it isn't clear to me the difference between "cascades the same change to all the partition metadata" and "limiting column changes only to table metadata". (Does Restrict apply only to one partition and Cascade to all them? The previous example says the opposite). And if that is the reason why the first column inserted (in CASCADE mode) takes the value of the partition column.