1

Here is the code in SAS, It finds the numeric columns with blank and replace with 0's

DATA dummy_table;
SET dummy_table;
ARRAY DUMMY _NUMERIC_;
DO OVER DUMMY;
  IF DUMMY=. THEN DUMMY=0;
END;
RUN;

I am trying to replicate this in Redshift, here is what I tried

create or replace procedure sp_replace_null_to_zero(IN tbl_nm varchar) as $$
Begin

Execute 'declare ' ||
            'tot_cnt int := (select count(*)  from information_schema.columns where table_name = ' || tbl_nm || ');' ||
            'init_loop int := 0; ' ||
            'cn_nm varchar; ' 
  
Begin
  While init_loop <= tot_cnt    
  Loop
  Raise info 'init_loop = %', Init_loop; 
  Raise info 'tot_cnt = %', tot_cnt; 
  
  Execute 'Select column_name into cn_nm from information_schema.columns ' ||
  'where table_name ='|| tbl_nm || ' and ordinal_position = init_loop ' ||
  'and data_type not in (''character varying'',''date'',''text''); '
  
  Raise info 'cn_nm = %', cn_nm;  
  
    if cn_nm is not null then
      Execute 'Update ' || tbl_nm ||
              'Set ' || cn_nm = 0 ||
              'Where ' || cn_nm is null or cn_nm =' ';
    end if;
 init_loop = init_loop + 1;
 end loop;             
End;  
End;
$$ language plpgsql;

Issues I am facing

When I pass the Input parameter here, I am getting 0 count

tot_cnt int := (select count(*) from information_schema.columns where table_name = ' || tbl_nm || ');'

For testing purpose I tried hardcode the table name inside proc, I am getting the error amazon invalid operation: value for domain information_schema.cardinal_number violates check constraint "cardinal_number_domain_check"

Is this even possible in redshift, How can I do this logic or any other workaround.

Need Expertise advise here!!

Tpk43
  • 363
  • 1
  • 5
  • 23
  • There is too much wrong in your code to give an answer without completely rewriting your function. To answer the question in your title: either use a `BEFORE INSERT` trigger on the table, or store the NULLs as they are and convert them to 0 when you `SELECT`. – Laurenz Albe Feb 25 '21 at 03:35

1 Answers1

1

You can simply run an UPDATE over the table(s) using the NVL(cn_nm,0) function

UPDATE tbl_raw 
SET col2 = NVL(col2,0);

However UPDATE is a fairly expensive operation. Consider just using a view over your table that wraps the columns in NVL(cn_nm,0)

CREATE VIEW tbl_clean
AS 
SELECT col1
     , NVL(col2,0) col2
FROM tbl_raw;
Joe Harris
  • 13,671
  • 4
  • 47
  • 54
  • Thats what I did UPDATE, the reason I was trying in this way because I have more than 40 columns created for modelling purpose. – Tpk43 Feb 26 '21 at 09:32
  • 1
    Faster to update all 40 columns at once that one at a time. An update in Redshift rewrites the entire updated row. – Joe Harris Feb 26 '21 at 13:14