The problem:
I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.
- The current code below took me over a night, still unfinished! I want to learn to be more efficient and speed this progress up. I feel like there is very much to optimize because there are A LOT of company records.
- Another issue was that i found no way how to check where my query was while looping (resulting in no way to check the progress). Because the progress took so long i killed the query and I'm searching for a better way to solve this issue.
The information:
There is a table with companies like:
----------------------------------------
| companies_1 |
----------------------------------------
| id | category_id | name |
----------------------------------------
| 1 | 1 | example-1 |
| 2 | 2 | example-1 |
| 3 | 1 | example-2 |
| 4 | 2 | example-2 |
| 5 | 3 | example-2 |
| 6 | 1 | example-3 |
----------------------------------------
A table with the DISTINCT company names:
-------------------------
| companies_2 |
-------------------------
| id | name |
-------------------------
| 1 | example-1 |
| 2 | example-2 |
| 3 | example-3 |
-------------------------
A categories table, like:
-------------------------
| categories |
-------------------------
| id | name |
-------------------------
And a junction table, like:
---------------------------------
| company_categories |
---------------------------------
| company_id | category_id |
---------------------------------
The current code:
This code works, but is far from efficient.
DELIMITER $$
DROP PROCEDURE IF EXISTS fill_junc_table$$
CREATE PROCEDURE fill_junc_table()
BEGIN
DECLARE r INT;
DECLARE i INT;
DECLARE i2 INT;
DECLARE loop_length INT;
DECLARE company_old_len INT;
DECLARE _href VARCHAR(255);
DECLARE cat_id INT;
DECLARE comp_id INT;
SET r = 0;
SET i = 0;
SET company_old_len = 0;
SELECT COUNT(*) INTO loop_length FROM companies;
WHILE i < loop_length DO
SELECT href INTO _href FROM company_old LIMIT i,1;
SELECT id INTO comp_id FROM companies WHERE site_href=_href;
SELECT COUNT(*) INTO company_old_len FROM company_old WHERE href=_href;
SET i2 = 0;
WHILE i2 < company_old_len DO
SELECT category_id INTO cat_id FROM company_old WHERE href=_href LIMIT i2,1;
INSERT INTO company_categories (company_id, category_id) VALUES (comp_id, cat_id);
SET r = r + 1;
SET i2 = i2 + 1;
END WHILE;
SET i = i + 1;
END WHILE;
SELECT r;
END$$
DELIMITER ;
CALL fill_junc_table();
Edit (new idea):
I am going to test another way to solve this problem by fully copying the companies_1 table with the following columns (company_id empty on copy):
---------------------------------------------
| company_id | category_id | name |
---------------------------------------------
Then, I will loop through the companies_2 table to fill the correct company_id related to the name-column.
I hope you can give your thoughts about this. When I finish my test I will leave the result over here for others.