This is how you can do it
CREATE TABLE thetable (event_id string, batt_id string, serial_number string, batt_key number);
INSERT INTO thetable VALUES('1','bat1','1',0);
INSERT INTO thetable VALUES('2','bat1','1',0);
so with a table in the basic form you want, with some dummy rows to show "pre-existing data that we want to use the existing key based on battery_id. I shrank the column names for typing reasons, and using number string instead of UUID's, but point still hold.
CREATE OR REPLACE SEQUENCE batt_key_seq;
WITH new_data(event_id, batt_id, serial_number) AS (
SELECT * FROM VALUES
('11','bat1', '10'),
('12','bat1', '10'),
('13','bat2', '11'),
('14','bat1', '10'),
('15','bat1', '10'),
('16','bat2', '11'),
('17','bat3', '12')
), new_data_with_keys AS (
SELECT d.batt_id
,COALESCE(t.batt_key, batt_key_seq.nextval) AS batt_key
FROM (
SELECT DISTINCT batt_id
FROM new_data
) AS d
LEFT JOIN thetable AS t
ON d.batt_id = t.batt_id
QUALIFY row_number() OVER (PARTITION BY d.batt_id ORDER BY event_id) = 1
)
SELECT n.*, k.batt_key
FROM new_data AS n
JOIN new_data_with_keys AS k
ON n.batt_id = k.batt_id
;
so the ROW_NUMBER() part is to capture only one of the existing rows (thus the double data insert in the exist data at the start), I feel like this might be cleaner some other way, but it getting the job done.
But in effect you get you keys, you find the existing keys, and give new keys to the new data, and then can insert the data in this enriched form.
what I am noticing in testing is that my key is jumping by 4 each run, and swapping to
,IFF(t.batt_key is not null, t.batt_key, batt_key_seq.nextval) AS batt_key
did not fix that.
So separating the key finding parts like
WITH new_data(event_id, batt_id, serial_number) AS (
SELECT * FROM VALUES
('11','bat1', '10'),
('12','bat1', '10'),
('13','bat2', '11'),
('14','bat1', '10'),
('15','bat1', '10'),
('16','bat2', '11'),
('17','bat3', '12')
), missing_keys AS (
SELECT d.batt_id
,batt_key_seq.nextval AS batt_key
FROM (
SELECT DISTINCT batt_id
FROM new_data
) AS d
LEFT JOIN thetable AS t
ON d.batt_id = t.batt_id
WHERE t.batt_id IS NULL
), old_keys AS (
SELECT distinct d.batt_id
,t.batt_key
FROM (
SELECT DISTINCT batt_id
FROM new_data
) AS d
LEFT JOIN thetable AS t
ON d.batt_id = t.batt_id
WHERE t.batt_id IS NOT NULL
), new_data_with_keys AS (
SELECT * FROM missing_keys
UNION ALL
SELECT * FROM old_keys
)
SELECT n.*, k.batt_key
FROM new_data AS n
JOIN new_data_with_keys AS k
ON n.batt_id = k.batt_id
;
now the new keys are not skipping/wasted.
The performance difference of the double joins verse keys space wastage is something you might need to test and or understand how much new verse existing data you will see and thus how much wastage happens. If the performance of the second one is vastly different you can always re-key the data later, albeit in these keys are used outside the table that can be a lot of data to update later.