Why Delta Lake Column Mapping needs both physical name and id

Asked Jul 10 '23 at 18:53

Active Jul 10 '23 at 21:49

Viewed 30 times

On the Documentation for Delta Transaction Protocol (https://github.com/delta-io/delta/blob/master/PROTOCOL.md#column-mapping), the column mapping section states as follows:

"There are two modes of column mapping, by name and by id. In both modes, every column - nested or leaf - is assigned a unique physical name, and a unique 32-bit integer as an id."

Why is it necessary to set up both the physical name and the integer id? If I'm using "name" mode, shouldn't unique physical names be enough to identify fields? Is because of any other feature or optimization?

edited Jul 10 '23 at 21:49

asked Jul 10 '23 at 18:53

Augusto Bernardi

Yes, it should be, and that's what the text says, too. It says there are _two_ modes, one mode that maps by name, and one mode that maps by id. Nowhere does it mention both at the same time. – Mike 'Pomax' Kamermans Jul 10 '23 at 21:56

Why Delta Lake Column Mapping needs both physical name and id

0 Answers0