1

I need to transform data from transactional database into a star schema. I already created all dimension tables and one fact table (cinjenica) as shown on ERD picture here: left(star-schema) right(transactional database table)

I also created Unique index ON dimension tables so to limit maximum theoretical row count for example in dimenzija_platforma i have UNIQUE index on (widows, linux, mac) atributes, and those atributes can have values 1 or non-one (binary value) which yields 2^3=8 max theoretical rows. Same is done for other dimension tables that will be derived from transactional database table with 29k rows.

Table cinjenica(fact table) contains compound PK/FK and all foreign keys have ON UPDATE CASCADE.

Now to input the data inside my star schema i created following stored procedure:

CREATE DEFINER=`root`@`localhost` PROCEDURE `new_row_to_star_scheme`()
BEGIN
    DECLARE i INTEGER;
    DECLARE total_row_count INTEGER;

    select appid from sppi.steam;
    SELECT FOUND_ROWS() into total_row_count;

    SET i = 0;
ponavljanje: LOOP
    IF i > total_row_count THEN
    LEAVE ponavljanje;
    END IF;

    INSERT INTO dimenzija_datum(ID,datum) 
    SELECT last_insert_id(),s.release_date 
    FROM steam as s LIMIT i,1
    ON DUPLICATE KEY UPDATE ID=last_insert_id(ID);

    SET @row_datum = last_insert_id();

    INSERT INTO dimenzija_vlasnik(ID,developer,publisher) 
    SELECT last_insert_id(),s.developer, s.publisher
    FROM steam as s LIMIT i,1
    ON DUPLICATE KEY UPDATE ID=last_insert_id(ID);

    SET @row_vlasnik = last_insert_id();

    INSERT INTO dimenzija_platforme(ID,tekstualno) 
    SELECT last_insert_id(),s.platforms
    FROM steam as s LIMIT i,1
    ON DUPLICATE KEY UPDATE ID=last_insert_id(ID);

    SET @row_platforme = last_insert_id();

    INSERT INTO dimenzija_kategorija(ID,zanr_tagovi,tip_tagovi) 
    SELECT last_insert_id(),s.genres, s.categories
    FROM steam as s LIMIT i,1
    ON DUPLICATE KEY UPDATE ID=last_insert_id(ID);

    SET @row_kategorija = last_insert_id();

    INSERT INTO cinjenica(fk_datum,fk_vlasnik,fk_platforma,fk_kategorija,naziv,positive_rating,negative_rating,avg_playtime,median_playtime,cijena,owners) 
    SELECT @row_datum, @row_vlasnik, @row_platforme, @row_kategorija, s.name, s.positive_ratings, s.negative_ratings, s.average_playtime, s.median_playtime, s.price, s.owners
    FROM steam as s LIMIT i,1
    ON DUPLICATE KEY UPDATE ID=last_insert_id(ID);

    SET i = i+1;
    END LOOP;
END

All other data that isn't inserted into dimensions from shown procedure is created with BEFORE INSERT triggers on each dimension table.

With this stored procedure I can populate my star schema until any of dimension tables encounters duplicate row after that I get Error code 1054 Unknown column 'ID' in 'field list'. After each INSERT INTO if there is duplicate data - ON DUPLICATE KEY UPDATE should update primary key of dimension table(foreign key for fact table) and trigger ON UPDATE CASCADE so in theory it should work. How should I correct my procedure to populate my star schema ?

Tomislav
  • 11
  • 2

0 Answers0