Update Null columns to Zero dynamically in Redshift

Question

Here is the code in SAS, It finds the numeric columns with blank and replace with 0's

DATA dummy_table;
SET dummy_table;
ARRAY DUMMY _NUMERIC_;
DO OVER DUMMY;
  IF DUMMY=. THEN DUMMY=0;
END;
RUN;

I am trying to replicate this in Redshift, here is what I tried

create or replace procedure sp_replace_null_to_zero(IN tbl_nm varchar) as $$
Begin

Execute 'declare ' ||
            'tot_cnt int := (select count(*)  from information_schema.columns where table_name = ' || tbl_nm || ');' ||
            'init_loop int := 0; ' ||
            'cn_nm varchar; ' 
  
Begin
  While init_loop <= tot_cnt    
  Loop
  Raise info 'init_loop = %', Init_loop; 
  Raise info 'tot_cnt = %', tot_cnt; 
  
  Execute 'Select column_name into cn_nm from information_schema.columns ' ||
  'where table_name ='|| tbl_nm || ' and ordinal_position = init_loop ' ||
  'and data_type not in (''character varying'',''date'',''text''); '
  
  Raise info 'cn_nm = %', cn_nm;  
  
    if cn_nm is not null then
      Execute 'Update ' || tbl_nm ||
              'Set ' || cn_nm = 0 ||
              'Where ' || cn_nm is null or cn_nm =' ';
    end if;
 init_loop = init_loop + 1;
 end loop;             
End;  
End;
$$ language plpgsql;

Issues I am facing

When I pass the Input parameter here, I am getting 0 count

tot_cnt int := (select count(*) from information_schema.columns where table_name = ' || tbl_nm || ');'

For testing purpose I tried hardcode the table name inside proc, I am getting the error amazon invalid operation: value for domain information_schema.cardinal_number violates check constraint "cardinal_number_domain_check"

Is this even possible in redshift, How can I do this logic or any other workaround.

Need Expertise advise here!!

There is too much wrong in your code to give an answer without completely rewriting your function. To answer the question in your title: either use a `BEFORE INSERT` trigger on the table, or store the NULLs as they are and convert them to 0 when you `SELECT`. — Laurenz Albe, Feb 25 '21 at 03:35

score 1 · Accepted Answer · answered Feb 25 '21 at 14:16

1

You can simply run an UPDATE over the table(s) using the NVL(cn_nm,0) function

UPDATE tbl_raw 
SET col2 = NVL(col2,0);

However UPDATE is a fairly expensive operation. Consider just using a view over your table that wraps the columns in NVL(cn_nm,0)

CREATE VIEW tbl_clean
AS 
SELECT col1
     , NVL(col2,0) col2
FROM tbl_raw;

answered Feb 25 '21 at 14:16

Joe Harris

13,671
4
47
54

Thats what I did UPDATE, the reason I was trying in this way because I have more than 40 columns created for modelling purpose. – Tpk43 Feb 26 '21 at 09:32
1

Faster to update all 40 columns at once that one at a time. An update in Redshift rewrites the entire updated row. – Joe Harris Feb 26 '21 at 13:14

Update Null columns to Zero dynamically in Redshift

1 Answers1