I need to insert data to a given external table which should be partitioned by the inserted date. My question is how is Hive handling the timestamp generation? When I select a timestamp for all inserted records like this:
WITH delta_insert AS (
SELECT trg.*, from_unixtime(unix_timestamp()) AS generic_timestamp
FROM target_table trg
)
SELECT *
FROM delta_insert;
Will the timestamp always be identical for all records, even if the query takes a lot of time to un?
Or should I alternatively only select an actual timestamp and join it with everything that is selected afterwards?
WITH insert_timestamp AS (
SELECT from_unixtime(unix_timestamp()) AS insert_timestamp
),
delta_insert AS (
SELECT trg.*, insert_timestamp.insert_timestamp
FROM target_table trg, insert_timestamp
)
SELECT *
FROM delta_insert;
Since it's not recommended to do cross joins in Hive, I wonder what would be the best approach, since I don't have enough test data to simulate long running queries to be sure they get inserted within the same partition of a delta load.