-1

I'm trying to pivot out a table of data stored in a vertical model into a more horizontal, SQL Server table-like model. Unfortunately due to the nature of the data, I cannot use the real data here so I worked up a generic example that follows the same model.

There are three columns to the table, an ID, column ID and value, where the ID and column ID form the Primary Key. Additionally none of the data is required (i.e. an ID can be missing column ID = 3 without breaking anything)

PetID | ColumnID | Value
---------------------------
1     | 1        | Gilda
1     | 2        | Cat
2     | 1        | Sonny
2     | 2        | Cat
2     | 3        | Black

Due to the fact that the Primary Key is a composite of two columns I cannot use the built in PIVOT functionality, so I tried doing a self LEFT JOIN:

SELECT T1.PetID
    ,T2.Value AS [Name]
    ,T3.Value AS [Type]
    ,T4.Value AS [Color]
FROM @Temp AS T1
LEFT JOIN @Temp AS T2 ON T1.PetID = T2.PetID
    AND T2.ColumnID = 1
LEFT JOIN @Temp AS T3 ON T1.PetID = T3.PetID
    AND T3.ColumnID = 2
LEFT JOIN @Temp AS T4 ON T1.PetID = T4.PetID
    AND T4.ColumnID = 3;

The idea being that I want to take the ID from T1 and then do a self LEFT JOIN to get each of the values by ColumnID. However I'm getting duplicates in the data:

PetID | Name  | Type | Color
------------------------------
1     | Gilda | Cat  | NULL
1     | Gilda | Cat  | NULL
2     | Sonny | Cat  | Black
2     | Sonny | Cat  | Black
2     | Sonny | Cat  | Black

I am able to get rid of these duplicates using a DISTINCT, but the dataset is rather large, so the required sort action is slowing down the query tremendously. Is there a better way to accomplish this or am I just stuck with a slow query?

Steven Ball
  • 461
  • 1
  • 5
  • 18
  • There's quite possibly a better way to do the joining part, but I do believe GROUP BY is faster than DISTINCT, and you can use that to remove duplicates. – GendoIkari Jan 11 '16 at 18:56
  • 1
    @GendoIkari, true but the GROUP BY will still use a sort in the background and slow down the query. I'm really looking for a more efficient way to do the joins, but thank you for the suggestion. – Steven Ball Jan 11 '16 at 19:01
  • Join is going to bring in more rows but you have some interesting answers to get around joins – paparazzo Jan 11 '16 at 19:24
  • Are you really using a table variable? If so try with #temp. You can get terrible execution plans with table variables. Table variable are often not good many rows. – paparazzo Jan 11 '16 at 19:30
  • @Frisbee, the real data uses a table in SQL Server. The example is just using a table variable for simplicity sake. – Steven Ball Jan 11 '16 at 19:33

4 Answers4

1

You can use a CASE statement and avoid the joins altogether.

SELECT
    PetID,
    MAX(CASE WHEN ColumnID = 1 THEN Value ELSE NULL END) AS Name,
    MAX(CASE WHEN ColumnID = 2 THEN Value ELSE NULL END) AS Type,
    MAX(CASE WHEN ColumnID = 3 THEN Value ELSE NULL END) AS Color
FROM @Temp
GROUP BY PetId

It is essential that PetID, ColumnID be your primary key for this to work correctly. Otherwise it will cause problems when the same ColumnID is used multiple times for the same PetID

Munir
  • 3,442
  • 3
  • 19
  • 29
1

You can use pivot if you'd like to..

SELECT  *
FROM    (SELECT PetID,
                (CASE ColumnID
                   WHEN 1 THEN 'Name'
                   WHEN 2 THEN 'Type'
                   WHEN 3 THEN 'Color'
                 END) ValueType,
                VALUE
         FROM   @Temp
        ) t 
PIVOT
(   MAX(Value) 
    FOR ValueType IN ([Name],[Type],[Color]) 
) p

Another way without the Sub query would be..

SELECT  PetID,
        [1] [Name],
        [2] [Type],
        [3] [Color]
FROM    @Temp
PIVOT
(   MAX(Value) 
    FOR ColumnID IN ([1],[2],[3]) 
) p
JamieD77
  • 13,796
  • 1
  • 17
  • 27
0

I don't understand your concern about sorting. You have a primary key so you also have an index. This is the correct way to do it:

select
    PetID,
    min(case when ColumnID = 1 then Value end) as Name,
    min(case when ColumnID = 2 then Value end) as Type,
    min(case when ColumnID = 3 then Value end) as Color
from @Temp
group by PetID

A fix for your duplication is simple though and will probably improve performance as well:

FROM (select distinct PetID from @Temp) AS T1
shawnt00
  • 16,443
  • 3
  • 17
  • 22
  • The problem with sorting is its a costly action for SQL Server and slows down query execution, particularly for large datasets. My example was just a small sample, as my full dataset is very large. It is generally considered best practice to refrain from sorting unless absolutely needed. – Steven Ball Jan 11 '16 at 19:14
  • Your data is already sorted because you have defined a primary key. How is a four-way join going to be faster than a single scan anyway? – shawnt00 Jan 11 '16 at 19:16
0
SELECT T1.PetID
    ,T1.Value AS [Name]
    ,T2.Value AS [Type]
    ,T3.Value AS [Color]
    --select *
FROM #Temp AS T1
LEFT JOIN #Temp AS T2 ON T1.PetID = T2.PetID
    AND T2.ColumnID = 2
LEFT JOIN #Temp AS T3 ON T1.PetID = T3.PetID
    AND T3.ColumnID = 3
where t1.ColumnID = 1

Your problem was that you were joining to the main table that had multiple rows.

HLGEM
  • 94,695
  • 15
  • 113
  • 186