sql optimization with anti join

Question

i have a recursive table of category and a company table with fields like:

category(id, name, parent) // parent is foreign key to category id :)
company(id, category_1, category_2, category_3) // category_* is foreign key to category id

category tree is at max depth=3 ;

category cx -> category cy -> category cz

with knowledge of company categories are always linked to last category (c3), i want all categories that a company is linked to (c1z, c2z, c3z, c1y, c2y, c3y, c1x, c2x, c3x) for my search engine. //c1y is the parent of category_1 and c1x is parent of parent of category 1...

the best query i came up with is :

SELECT
  ID,
  NAME
FROM category c3
WHERE ID IN (
    select category_1 from company where id=:companyId
  union
    select category_2 from company where id=:companyId
  union
    select category_3 from company where id=:companyId
  union
        select parent from category where id in (
          select category_1 from company where id=:companyId
          union
          select category_2 from company where id=:companyId
          union
          select category_3 from company where id=:companyId
        )
  union 
        select parent from category where id in (
          select parent from category where id in (
          select category_1 from company where id=:companyId
          union
          select category_2 from company where id=:companyId
          union
          select category_3 from company where id=:companyId
          )
        )
  )

it has so much duplicate in it. one for category_* in company. and one for repeating it multiple times.

any way to remove all this duplicates ?

--update--

suppose we solve the category-* field with using two tables what about recursive problem with 3 levels of category?

for example if there is only one category it would look like

SELECT
  ID,
  NAME
FROM category
WHERE ID IN (
  select category_1 from company where id=:companyId
  union
  select parent from category where id in (
    select category_1 from company where id=:companyId
  )
  union
  select parent from category where id in (
    select parent from category where id in (
      select category_1 from company where id=:companyId
    )
  )
);

i'm using oracle but i don't want it to be database related, cause its very possible to change it in the feature. — alizelzele, Jan 12 '15 at 10:14

Maciej Los · Accepted Answer · 2015-01-12T10:29:25.220

If you want to join data, use something like this (SQL server example):

DECLARE @category TABLE (id INT IDENTITY(1,1), name VARCHAR(30), parent INT) -- parent is foreign key to category id :)
DECLARE @company TABLE (id INT IDENTITY(1,1), category_1 INT, category_2 INT, category_3 INT) --category_* is foreign key to category->id


INSERT INTO @category (name, parent )
VALUES('Top category', null), ('Cars', 1)

INSERT INTO @company (category_1, category_2 , category_3 )
VALUES(2, null, null), (2, 2, null), (2, 2, 2)


SELECT t1.*, t2.*
FROM @category AS t1 INNER JOIN @company AS t2 ON t1.id = t2.category_1 or t1.id = t2.category_2  or t1.id = t2.category_3

Above code produces:

id  name    parent  id  category_1  category_2  category_3
2   Cars    1   1   2   NULL    NULL
2   Cars    1   2   2   2   NULL
2   Cars    1   3   2   2   2

But, such kind of database structure is wrong!

Instead of one table

company(id, category_1, category_2, category_3)

create two tables

company(id, name)
comp_cat(id, comp_id, cat_id)

Why? I don't want to answer directly, so i ask you: 1) what happens when company is related to more than 3 categories? 2) why to save nulls in case when second and third category is not set?

In case of SQL Server, you can use Common Table Expressions:

;WITH CTE AS
(
    SELECT id, category_1 AS cat_id
    FROM @company 
    WHERE NOT category_1 IS NULL
    UNION ALL
    SELECT id, category_2 AS cat_id
    FROM @company 
    WHERE NOT category_2 IS NULL
    UNION ALL
    SELECT id, category_3 AS cat_id
    FROM @company 
    WHERE NOT category_3 IS NULL
)
SELECT DISTINCT t1.*, t2.*
FROM CTE AS t1 INNER JOIN @category AS t2 ON t1.cat_id = t2.id

Cheers, Maciej

i know lots of those categories will be null. and i thought about changing to 2 tables. there should never be more than 3 categories. so i thought maybe this way it will have better performance instead of using 2 tables — alizelzele, Jan 12 '15 at 10:20
@alizelzele, As i mentioned, using one table is bad programming practice when you work with relational database. — Maciej Los, Jan 12 '15 at 10:27
i saw Common Table Expressions but i don't want to use any dbms Specefic grammer. — alizelzele, Jan 12 '15 at 10:36
@alizelzele, Common Table Expressions are available in MSSQL, Oracle (known as subquery factoring), and Postgres. It is part of the SQL-99 standard. Only MySQL lacks the feature. — Nicholai, Jan 12 '15 at 13:36

score 0 · Answer 2 · answered Jan 12 '15 at 14:41

i used common table expressions for my query. and this is the last query i come up with

 with cte as (
    select category_1 as id  from company where id=:companyId and category_1 is not null
    union
    select category_2 as id from company where id=:companyId and category_2 is not null
    union
    select category_3 as id from company where id=:companyId and category_3 is not null
  ) select id, name FROM category WHERE id IN (
      select id from cte 
    union
      select parent from category where id in (select id from cte)
    union
      select parent from category where id in (
       select parent from category where id in (select id from cte)
     )
   );

that's the best way i could think of. thanks @Maciej for showing the way, and thanks @Nicholai for info on DBMS support.

Only if there was a way to transpose a row to column like matlab do... :P

It's possible to transform/transpose rows to columns and vice versa. See Pivot and Unpivot statements. — Maciej Los, Jan 12 '15 at 15:18

sql optimization with anti join

2 Answers2