Create function in postgresql to update column values from a table with preferred values and aliases

Question

I want to create a function that will update a column of type varchar to a preferred string that is referenced in the column of another table to help me clean this column more iteratively.

CREATE TABLE big_table (
    mn_uid NUMERIC PRIMARY KEY,
    user_name VARCHAR
    );

INSERT INTO big_table VALUES
        (1, 'DAVE'),
        (2, 'Dave'),
        (3, 'david'),
        (4, 'Jak'),
        (5, 'jack'),
        (6, 'Jack'),
        (7, 'Grant'); 

CREATE TABLE nameKey_table (
    nk_uid NUMERIC PRIMARY KEY,
    correct VARCHAR,
    wrong VARCHAR
    );

INSERT INTO nameKey_table VALUES
        (1, 'David', 'Dave_DAVE_dave_DAVID_david'),
        (2, 'Jack', 'JACK_jack_Jak_jak');

I want to perform the following procedure:

UPDATE big_table
SET user_name = (SELECT correct
                 FROM nameKey_table 
                 WHERE wrong 
                 LIKE '%DAVE%')
WHERE user_name = 'DAVE';

but looped over each user_name in big_table so that I have a function that can do something like this:

UPDATE big_table SET user_name = corrected_name_fn();

Here is my attempt to do something like this but I can't seem to get it to work:

CREATE FUNCTION corrected_name_fn() RETURNS VARCHAR AS $$
DECLARE entry RECORD;
DECLARE correct_name VARCHAR;
BEGIN 
FOR entry IN SELECT DISTINCT user_name FROM big_table LOOP
     EXECUTE 'SELECT correct 
              FROM nameKey_table
              WHERE wrong 
              LIKE ''%$1%''' 
              INTO correct_name
              USING entry;
            RETURN correct_name;
            END LOOP;
END;    
$$ LANGUAGE plpgsql;

I want the final output in big_table to be:

| mn_uid |  user_name |
|   1    | 'David'    |
|   2    | 'David'    |
|   3    | 'David'    |
|   4    | 'Jack'     |
|   5    | 'Jack'     |
|   6    | 'Jack'     |
|   7    | 'Grant'    |

I realize rows 6 and 7 provide two unique cases that I want to build into the function with IF ELSE statements.

If user_name is in nameKey_table.correct, go to next
If user_name is not in nameKey_table.correct or does not match a string in nameKey_table.wrong, leave as is.

Thanks for any help on this!!

What about your current solution is not working or giving you your expected result? — Nick, Sep 19 '18 at 17:34
The current function results in an empty `user_name` column. — nmsindt, Sep 19 '18 at 17:46
Why do you want a function? The same can be done with a plain UPDATE statement. — wildplasser, Sep 19 '18 at 17:52

wildplasser · Answer 1 · 2018-09-19T19:06:24.760

You don't need a function; you can just update one table from the contents of another table:

UPDATE big_table dst
SET user_name = src.correct
FROM nameKey_table src
WHERE src.wrong LIKE '%' || dst.user_name || '%'
AND dst.user_name <> src.correct -- avoid idempotent updates
        ;

And if you need performance, dont rely on the LIKE operator, it cannot use indexes for leading %. Instead, use a lookup-table with one entry per row:

CREATE TABLE bad_spell (
    correct VARCHAR,
    wrong VARCHAR PRIMARY KEY -- This will cause an unique index to be created.
    );

INSERT INTO bad_spell VALUES
        ('David', 'Dave')
        ,('David', 'DAVE')
        ,('David', 'dave')
        ,('David', 'DAVID')
        ,('David', 'david')
        ,('Jack', 'JACK')
        ,('Jack', 'jack')
        ,('Jack', 'Jak')
        ,('Jack', 'jak')
        ;
-- This indexes could be temporary
CREATE INDEX ON big_table(user_name);

-- EXPLAIN
UPDATE big_table dst
SET user_name = src.correct
FROM bad_spell src
WHERE dst.user_name = src.wrong
AND dst.user_name <> src.correct -- avoid idempotent updates
        ;

SELECT* FROM big_table
        ;

score 0 · Answer 2 · answered Sep 19 '18 at 18:37

It sounds like you want a trigger on the table. Here is my suggestion:

CREATE OR REPLACE FUNCTION tf_fix_name() RETURNS TRIGGER AS
$$
DECLARE
    corrected_name TEXT;
BEGIN

    SELECT correct INTO corrected_name FROM nameKey_table WHERE expression ~* NEW.user_name;
    IF FOUND THEN
        NEW.user_name := corrected_name;
    END IF;

    RETURN NEW;
END;
$$
    LANGUAGE plpgsql;


CREATE TEMP TABLE big_table (
    mn_uid INT PRIMARY KEY,
    user_name TEXT NOT NULL
);

CREATE TRIGGER trigger_fix_name
    BEFORE INSERT
    ON big_table
    FOR EACH ROW
    EXECUTE PROCEDURE tf_fix_name();

CREATE TEMP TABLE nameKey_table (
    nk_uid INT PRIMARY KEY,
    correct TEXT NOT NULL,
    expression TEXT NOT NULL
    );

INSERT INTO nameKey_table VALUES
        (1, 'David', '(dave|david)'),
        (2, 'Jack', '(jack|jak)');

INSERT INTO big_table VALUES
        (1, 'DAVE'),
        (2, 'Dave'),
        (3, 'david'),
        (4, 'Jak'),
        (5, 'jack'),
        (6, 'Jack'),
        (7, 'Grant');

SELECT * FROM big_table;

+--------+-----------+
| mn_uid | user_name |
+--------+-----------+
|      1 | David     |
|      2 | David     |
|      3 | David     |
|      4 | Jack      |
|      5 | Jack      |
|      6 | Jack      |
|      7 | Grant     |
+--------+-----------+
(7 rows)

Note: I think you can do what you want a lot easier with a case insensitive regular expression. And I also changed your primary keys to INTs. Not sure why they are numerics, but it doesn't really change the solutions. My solution was developed and tested on PostgreSQL 9.6.

Create function in postgresql to update column values from a table with preferred values and aliases

2 Answers2