3

So I have these tables:

-- tbl_obs
id  lat  lon   created
-------------------------
1   1.2  -2.1  2002-08-03
2   1.9  -5.5  2002-08-03
3   1.5  -4.1  2002-08-03

-- tbl_obsdata
id  name         value     obs_id
---------------------------------
1   gender       Male       1
2   type         Type I     1
3   description  Some desc  1
4   gender       Female     2
5   type         Type II    2
6   description  Some desc  2
7   gender       Female     3
8   type         Type II    3
9   description  Some desc  3

I want a query that will combine data from both table like this:

lat  lon   created     gender  type  description
------------------------------------------------
1.2  -2.1  2002-08-03  Male   Type I  Some desc
1.9  -5.5  2002-08-03  Female Type I  Some desc
1.5  -4.1  2002-08-03  Male   Type II Some desc

I know I can do this with a pivot like:

with cte as (
 select obsdata.name, obsdata.value, obs.lat, obs.lon, obs.created
 from obsdata
 left join obs on obs.id = obsdata.obs_id
)
select lat, lon, created, gender, type, description
from cte
pivot(
 max(value)
 for [name] in (gender, type, description)
) as pvt

So far this returns the result (I think), but I have about a million rows and this runs really slow. Any alternative way to achieve this that would be much faster? I'm using SQL Server 2012.

John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
Alex
  • 193
  • 1
  • 2
  • 12

2 Answers2

6

Another option is

Select A.lat
      ,A.lon
      ,A.created
      ,gender      = max(IIF(B.name='gender',B.value,null))
      ,type        = max(IIF(B.name='type',B.value,null))
      ,description = max(IIF(B.name='description',B.value,null))
 From  tbl_obs A
 Join  tbl_obsdata B on (A.id=B.obs_id)
 Group By A.lat
      ,A.lon
      ,A.created

Returns

lat lon     created     gender  type    description
1.2 -2.1    2002-08-03  Male    Type I  Some desc
1.5 -4.1    2002-08-03  Female  Type II Some desc
1.9 -5.5    2002-08-03  Female  Type II Some desc
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
5

Optimize the pivot first, and then the join. I think SQL Server does a reasonable job for pivots, so start with:

select obs_id,  gender, type, description
from tbl_obsdata
pivot (max(value) for [name] in (gender, type, description)
      ) as pvt;

Then, create an index on tbl_obsdata(obs_id, name, value). This should be reasonably fast.

If so, then join in the rest:

with cte as (
      select obs_id,  gender, type, description
      from tbl_obsdata
      pivot (max(value) for [name] in (gender, type, description)
            ) as pvt
    )
select obs.lat, obs.lon, obs.created,
       cte.gender, cte.type, cte.description
from cte join
     obs
     on obs.id = cte.obs_id;

EDIT:

I also wonder how this would fare:

select obs.lat, obs.lon, obs.created, od.gender, od.type, od.description
from obs cross apply
     (select max(case when name = 'gender' then value end) as gender,
             max(case when name = 'type' then value end) as type,
             max(case when name = 'description' then value end) as description

      from tbl_obsdata od
      where od.obs_id = obs.id
     ) od;

This wants an index on tbl_obsdata(obs_id, name, value) as well.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • It is looking beautiful now. By looking at the question, I am thinking that CTE may be the reason for slowness. Will you prefer temporary table (with index) instead of CTE ? – DatabaseCoder Sep 13 '16 at 22:18
  • @BhatiaAshish . . . I prefer a single query to temporary tables, unless the latter are absolutely necessary. – Gordon Linoff Sep 13 '16 at 22:22
  • 2
    Pivots have performance issues and are generally going to be slower than solutions such as John Cappalletti has given here. Unpivots are fine but since I work in a high volume large data environment I never use pivots. – btberry Sep 13 '16 at 22:22
  • 2
    @btberry . . . Interesting. I basically never use pivot, preferring conditional aggregation. I thought that was an old habit and the pivot syntax had pretty much the same performance. Do you have a reference for the performance of pivots? – Gordon Linoff Sep 13 '16 at 22:23
  • 1
    @GordonLinoff Same here. I've liked to call it my 'poor man's pivot', yet it seems to be better after all: https://sqlsunday.com/2016/01/29/pivot-unpivot-and-performance/ – Aaron Dietz Sep 13 '16 at 22:31
  • @GordonLinoff I've done large scale testing in the past to study it but I do not seem to have those scripts still. A few years ago I took a SolidQ advanced t-sql class with Itzik Ben-Gan and that was one of the things he covered. The PIVOT has the appearance of being a nice little tool but has some demons in the closet. For smaller shops it can be nice for the convenience but it does not scale well. The number of columns has a lot to do with the handicap ... if I remember correctly I believe the more columns you pivot the less the penalty but I may have that backwards. I just stay away. – btberry Sep 13 '16 at 22:32
  • @btberry Never cared much for PIVOT. I found them a little clumsy and somewhat limiting, but the real deal breaker for me was the performance on larger datasets. – John Cappelletti Sep 13 '16 at 22:43