How can I reduce the number of joined tables in this query?

Question

CREATE TABLE Attributes (
    id      VARCHAR(40),
    type    VARCHAR(16),
    data    VARCHAR(2048),
    PRIMARY KEY(id,type)
);

This is the general format of query I am trying to run. The general idea is that 'objects' have unique ids and then have a key/value pair like javascript objects.

SELECT a1.id, a1.data, a2.data, a3.data, a4.data, a6.data
    FROM Attributes a1, Attributes a2, Attributes a3, Attributes a4, Attributes a5
    LEFT JOIN Attributes a6 ON (a6.id=a5.id AND a6.type = 'Foreign Id')
        WHERE a1.id=a2.id
            AND a1.id=a3.id
            AND a1.id=a4.id
            AND a1.id=a5.id
            AND a1.type = 'First Name' 
            AND a2.type = 'Middle Name' 
            AND a3.type = 'Last Name'
            AND a4.type = 'Timestamp'
            AND a5.type = 'Count'
            AND a5.data = 'MY_ID'

In this query 'Foreign Id' is an optional attribute. The problem is that I am getting

The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay.

I realize that I can do as said, but the warning worries me that this query is horribly inefficient. Is there a better way to formulate the query?

GarethD · Accepted Answer · 2013-07-18T18:54:54.640

Since the primary key key is ID, Type you can use aggregate functions and ensure the query is still deterministic, reducing the query to 0 joins:

SELECT  a.ID,
        MAX(CASE WHEN a.type = 'First Name' THEN a.Data END) AS FirstName,
        MAX(CASE WHEN a.type = 'Last Name' THEN a.Data END) AS LastName,
        MAX(CASE WHEN a.type = 'Timestamp' THEN a.Data END) AS `Timestamp`,
        MAX(CASE WHEN a.type = 'Count' THEN a.Data END) AS `Count`,
        MAX(CASE WHEN a.type = 'MY_ID' THEN a.Data END) AS MY_ID,
        MAX(CASE WHEN a.Type = 'Foreign Id' THEN a.Data END) AS ForeignId
FROM    Attributes a
GROUP BY a.ID;

It is worth noting though that the Entity-Attribute-Value Model is an SQL Antipattern, and you could be much better off normalising your data to store the attributes as columns, rather than having to use the above query to turn rows into columns.

EDIT

To add a filter based on an attribute use the HAVING clause:

SELECT  a.ID,
        MAX(CASE WHEN a.type = 'First Name' THEN a.Data END) AS FirstName,
        MAX(CASE WHEN a.type = 'Last Name' THEN a.Data END) AS LastName,
        MAX(CASE WHEN a.type = 'Timestamp' THEN a.Data END) AS `Timestamp`,
        MAX(CASE WHEN a.type = 'Count' THEN a.Data END) AS `Count`,
        MAX(CASE WHEN a.type = 'MY_ID' THEN a.Data END) AS MY_ID,
        MAX(CASE WHEN a.Type = 'Foreign Id' THEN a.Data END) AS ForeignId
FROM    Attributes a
GROUP BY a.ID
HAVING MAX(CASE WHEN a.type = 'MY_ID' THEN a.Data END) = 1;

+1 except that storing attributes in columns would be *normalization*, not *denormalization*. The EAV antipattern is not even a relation; it cannot be in any normal form. — Bill Karwin, Jul 18 '13 at 18:57
in my case, i dont have a choice but for the sake of knowing better isnt EAV the general premise of NOSQL? — chacham15, Jul 23 '13 at 07:16
I haven't done a lot of work with NoSQL, but yes, I believe the main advantage is it's flexibility in terms of schemas, and this lends itself to the EAV model. I am not saying EAV doesn't have it's place in SQL, it is just not always the best solution, especially if you have a low number of fixed attributes — GarethD, Jul 24 '13 at 07:35

score 0 · Answer 2 · answered Jul 18 '13 at 18:36

Your attributes table is narrow, but with lots of rows. And you are going to either do a bunch of self joins or you are going to query with group by a.id and use aggregate functions. The latter approach eliminates the joins, but still hits lots of rows.

I think a better alternative is to denormalize your data model a little bit. This would involve creating a table that contains a column for 'First Name', a column for 'Middle Name', etc. Then your various attributes that are associated with the ID are all on the same row. You end up with a wider table, but far fewer rows and no joins.

How can I reduce the number of joined tables in this query?

2 Answers2