3

Suppose we have a DB table with fields

"id", "category", "subcategory", "brand", "name", "description", etc. 

What's a good way of creating separate tables for category, subcategory and brand and the corresponding columns and rows in the original table becoming foreign key references?

To outline the operations involved:

  • get all unique values in each column of the original table which should become foreign keys;
  • create tables for those
  • create foreign key reference columns in the original table (or a copy)

In this case, the PostgreSQL DB is accessed via Sequel in a Ruby app, so available interfaces are the command line, Sequel, PGAdmin, etc...

The question: how would you do this?

JustGage
  • 1,534
  • 17
  • 20
bcsantos
  • 2,635
  • 5
  • 21
  • 22
  • `get all unique values in each column of the original table which should become foreign keys; - create tables for those -` yes, but add a surrogate (maybe a serial) to the new table, and let the original tables's FK be a (new) integer field referring to the surrogate instead of the actual value(s). – wildplasser Apr 25 '15 at 16:29

2 Answers2

11
        -- Some test data
CREATE TABLE animals
        ( id SERIAL NOT NULL PRIMARY KEY
        , name varchar
        , category varchar
        , subcategory varchar
        );
INSERT INTO animals(name, category, subcategory) VALUES
 ( 'Chimpanzee' , 'mammals', 'apes' )
,( 'Urang Utang' , 'mammals', 'apes' )
,( 'Homo Sapiens' , 'mammals', 'apes' )
,( 'Mouse' , 'mammals', 'rodents' )
,( 'Rat' , 'mammals', 'rodents' )
        ;

        -- [empty] table to contain the "squeezed out" domain
CREATE TABLE categories
        ( id SERIAL NOT NULL PRIMARY KEY
        , category varchar
        , subcategory varchar
        , UNIQUE (category,subcategory)
        );

        -- The original table needs a "link" to the new table
ALTER TABLE animals
        ADD column category_id INTEGER -- NOT NULL
        REFERENCES categories(id)
        ;
        -- FK constraints are helped a lot by a supportive index.
CREATE INDEX animals_categories_fk ON animals (category_id);

        -- Chained query to:
        -- * populate the domain table
        -- * initialize the FK column in the original table
WITH ins AS (
        INSERT INTO categories(category, subcategory)
        SELECT DISTINCT a.category, a.subcategory
        FROM animals a
        RETURNING *
        )
UPDATE animals ani
SET category_id = ins.id
FROM ins
WHERE ins.category = ani.category
AND ins.subcategory = ani.subcategory
        ;

        -- Now that we have the FK pointing to the new table,
        -- we can drop the redundant columns.
ALTER TABLE animals DROP COLUMN category, DROP COLUMN subcategory;

        -- show it to the world
SELECT a.*
        , c.category, c.subcategory
FROM animals a
JOIN categories c ON c.id = a.category_id
        ;

Note: the fragment:

WHERE ins.category = ani.category AND ins.subcategory = ani.subcategory

will lead to problems if these columns contain NULLs. It would be better to compare them using

(ins.category,ins.subcategory) IS NOT DISTINCT FROM (ani.category,ani.subcategory)

wildplasser
  • 43,142
  • 8
  • 66
  • 109
3

I'm not sure I completely understand your question, if this doesn't seem to answer it, then please leave a comment and possibly improve your question to clarify, but it sounds like you want to do a CREATE TABLE xxx AS. For example:

CREATE TABLE category AS (SELECT DISTINCT(category) AS id FROM parent_table);

Then alter the parent_table to add a foreign key constraint.

ALTER TABLE parent_table ADD CONSTRAINT category_fk FOREIGN KEY (category) REFERENCES category (id);

Repeat this for each table you want to create.

Here is the related documentation:

CREATE TABLE

ALTER TABLE

Note: code and references are for Postgresql 9.4

David S
  • 12,967
  • 12
  • 55
  • 93
  • Thanks for the reply, another answer is closer to what I was looking for because indexes are needed. Cheers – bcsantos Apr 30 '15 at 09:19