We are using hive 3.1.3 and we're running Queries on Tez engine (Hive on Tez) on Cloudera platform.
We are still in the beginning phase and we have a situation where we want to join several tables together using string columns.
for example
table 1
table 2
and we want join table 1 with table 2 using table1.id = table2.id (note that the ids are hexadecimal values)
Now one of the discussion we're making is that joining using string values will affect the performance of the queries and it is better to change these ID columns from string to integer. but we really do not have the luxury to do that.
My question is, if we changed the ID columns to integers, will we notice that much of a difference in the query performance (Specially Joins), is it worth the change from String to integer ? (our largest table is almost 600 million records)
Thank you in advance.