2

Someone please change my title to better reflect what I am trying to ask.

I have a table like

Table (id, value, value_type, data)

ID is NOT unique. There is no unique key.

value_type has two possible values, let's say A and B.

Type B is better than A, but often not available.

For each id if any records with value_type B exists, I want all the records with that id and value_type B.

If no record for that id with value_Type B exists I want all records with that id and value_type A.

Notice that if B exists for that id I don't want records with type A.

I currently do this with a series of temp tables. Is there a single select statement (sub queries OK) that can do the job?

Thanks so much!

Additional details:

SQL Server 2005

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
kralco626
  • 8,456
  • 38
  • 112
  • 169
  • SQL 2005 so "partiton by" is available to me. I tried to use it, but had an issue with when there was more than one record with type B. – kralco626 Aug 20 '10 at 20:13

7 Answers7

6

RANK, rather than ROW_NUMBER, because you want ties (those with the same B value) to have the same rank value:

WITH summary AS (
  SELECT t.*,
         RANK() OVER (PARTITION BY t.id 
                          ORDER BY t.value_type DESC) AS rank
    FROM TABLE t
   WHERE t.value_type IN ('A', 'B'))
SELECT s.id,
       s.value,
       s.value_type,
       s.data
  FROM summary s
 WHERE s.rank = 1

Non CTE version:

SELECT s.id,
       s.value,
       s.value_type,
       s.data
  FROM (SELECT t.*,
               RANK() OVER (PARTITION BY t.id 
                                ORDER BY t.value_type DESC) AS rank
          FROM TABLE t
         WHERE t.value_type IN ('A', 'B')) s
 WHERE s.rank = 1

WITH test AS (
   SELECT 1 AS id, 'B' AS value_type
   UNION ALL
   SELECT 1, 'B'
   UNION ALL
   SELECT 1, 'A'
   UNION ALL
   SELECT 2, 'A'
   UNION ALL
   SELECT 2, 'A'),
     summary AS (
   SELECT t.*,
          RANK() OVER (PARTITION BY t.id 
                           ORDER BY t.value_type DESC) AS rank
     FROM test t)
SELECT *
  FROM summary
 WHERE rank = 1

I get:

id   value_type  rank
----------------------
1    B           1
1    B           1
2    A           1
2    A           1
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • 1
    @Martin Smith: So you admit it - you *stole* my answer! =) – OMG Ponies Aug 20 '10 at 20:24
  • haha! ya i was doing almost exactly that with ROW_NUMBER. Just when I thought I was getting the hang of using these "window" functions. Thanks! i'll test and let you know. – kralco626 Aug 20 '10 at 20:28
  • Your solution actually doesn't work. And i'm not sure why. I get an additional 22 records. Although it was a few seconds faster than Beth's solution. – kralco626 Aug 20 '10 at 21:01
  • @kralco626: Are there `value_type`'s other than A and B? See update – OMG Ponies Aug 20 '10 at 21:06
  • I would like to use your solution if you can make it work, because it only uses TABLE once. This is excepcially benificial because that table is a temp table and I would like to replace it with the query that populates that temp table. In your case the query would run once, in Beth's case it would have to be called 3 times. – kralco626 Aug 20 '10 at 21:12
  • @kralco626: Here's a [query on the Data Explorer](http://odata.stackexchange.com/stackoverflow/q/9653/rank) using basic data, I get the correct results. – OMG Ponies Aug 20 '10 at 21:21
  • @OMG - i know it does seem odd. It should work. I just get some extra rows when I use it, not sure why. It is part of a larger query, I can post the whole thing if you would like me to. – kralco626 Aug 20 '10 at 21:25
  • @kralco626: I understand that it's part of a larger query, but I thought you were saying the 22 extra rows are from this query in particular. Does this & Beth's query return the same result set, without incorporating into the larger one? – OMG Ponies Aug 20 '10 at 21:36
  • No- I just ran it. Just running your query and just running Beths you get around 20 more records. – kralco626 Aug 20 '10 at 21:46
  • Your result set is actually disjoint from Beths. I ran Beth's query except your query and I got 287659 rows. – kralco626 Aug 20 '10 at 22:01
  • OH I THINK I GOT IT. I'm not using A and B for my codes! that was just for example purposes. In my specific case I dont wanna sort DESC! Sorry for the confusing and thanks for the answer! – kralco626 Aug 20 '10 at 22:06
  • @kralco626: What's the value_type data type - numeric? – OMG Ponies Aug 20 '10 at 22:07
  • This method, when done on a heap with about 300000 rows, takes up to 1000 times the number of reads as other solutions presented. Indexes or particular patterns of A/B appearances could change things, but not by enough, I think, to fix that. – ErikE Aug 21 '10 at 03:44
  • @Emtucifor - Which other solution is the most efficient? I tried Beths and this solution beat it for time. Also, "table" is a temp table and I want to replace it with a subquery, so a solution that only uses it once is ideal. – kralco626 Aug 23 '10 at 10:33
4
SELECT * 
  FROM table 
 WHERE value_type = B
UNION ALL
SELECT * 
  FROM table 
 WHERE ID not in (SELECT distinct id 
                    FROM table 
                   WHERE value_type = B)
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
Beth
  • 9,531
  • 1
  • 24
  • 43
  • I think this query will miss the record id's that are only associated with type A values. – dvanaria Aug 20 '10 at 20:20
  • @Beth: +1: Shouldn't need the DISTINCT, for sake of the `IN` – OMG Ponies Aug 20 '10 at 20:23
  • wouldn't the distinct improve performance? or does it make it worse? – Beth Aug 20 '10 at 20:29
  • @Beth - I would think it would depend on how often id is duplicated. Not in is very expencive so less records in the sub query is better. but if distinct doesn't remove that many records than maybe it's slower? thats what i would guess. – kralco626 Aug 20 '10 at 20:30
  • 1
    @Beth - I tested with and without the "distinct". It was 13 seconds faster using distinct. – kralco626 Aug 20 '10 at 20:45
  • woot! I'm sure Pony-boy is right, though, don't strictly need it and could hurt performance – Beth Aug 20 '10 at 20:48
  • It is part of a larger procedure. So it took 2 min 18 seconds rather than 2 min 31 seconds. – kralco626 Aug 20 '10 at 21:02
  • This query has the most reads of any presented here, because it has to do 2 joins with 3 references to the table. For what it's worth, though NOT IN works, it really should be written as the anti semi join the optimizer was smart enough to switch it to. Also, the DISTINCT is meaningless. The execution plan is the same either way, using an anti semi join which doesn't care about uniqueness). – ErikE Aug 21 '10 at 03:21
  • @kralco626 One test is not enough, you need to run it multiple times in isolation from other factors like being in the middle of an SP – ErikE Aug 21 '10 at 03:26
2

The shortest query to do the job I can think of:

SELECT TOP 1 WITH TIES *
FROM #test
ORDER BY Rank() OVER (PARTITION BY id ORDER BY value_type DESC)

This is about 50% worse on CPU as OMG Ponies' and Christoperous 5000's solutions, but the same number of reads. It's the extra sort that is making it take more CPU.

The best-performing original query I've come up with so far is:

SELECT * 
FROM #test 
WHERE value_type = 'B'
UNION ALL
SELECT * 
FROM #test T1
WHERE NOT EXISTS (
   SELECT *
   FROM #test T2
   WHERE
      T1.id = T2.id
      AND T2.value_type = 'B'
)

This consistently beats all the others presented on CPU by about 1/3rd (the others are about 50% more) but has 3x the number of reads. The duration on this query is often 2/3rds the time of all the others. I consider it a good contender.

Indexes and data types could change everything.

ErikE
  • 48,881
  • 23
  • 151
  • 196
  • Thanks for doing the analysis, nice job. However, I need a solution that uses the table only once. – kralco626 Aug 23 '10 at 10:38
  • also i think your solutions would only return one record. What if there are more than one record with the highest priority type? – kralco626 Aug 23 '10 at 15:36
1
declare @test as table(
 id int , value [nvarchar](255),value_type [nvarchar](255),data  int)

 INSERT INTO @test
 SELECT 1, 'X', 'A',1 UNION
 SELECT 1, 'X', 'A',2 UNION
 SELECT 1, 'X', 'A',3 UNION
 SELECT 1, 'X', 'A',4 UNION
 SELECT 2, 'X', 'A',5 UNION
 SELECT 2, 'X', 'B',6 UNION
 SELECT 2, 'X', 'B',7 UNION
 SELECT 2, 'X', 'A',8 UNION
 SELECT 2, 'X', 'A',9 


 SELECT * FROM @test x
 INNER JOIN 
 (SELECT id, MAX(value_type) as value_type FROM 
 @test GROUP BY id) as y
 ON x.id = y.id AND x.value_type = y.value_type
Christoph
  • 4,251
  • 3
  • 24
  • 38
  • Your query performs near the best of all the queries submitted (often best) so deserves recognition. – ErikE Aug 21 '10 at 03:30
  • @Emtucifor - I belive, however, that if you replaced @test for subquery, or non-indexed view, it would take longer because it would have to call it twice. That would be my fault however, not Chris's because i didn't say that in my question – kralco626 Aug 23 '10 at 10:36
  • @kralco626 - The number of times a table is mentioned in a query does not correlate to performance. Focusing on reducing the number of times the table name is mentioned, without looking at actual performance of the query (CPU and reads) will definitely lead you down the wrong path. – ErikE Aug 23 '10 at 15:55
  • I know that the number of time a table is used does not nessesariy corrisipond to preformance. But what if "table" was replaced with a query. (as I plan on doing, I know i proly should have mentioned this) Now if you have to call the sub-query 3 times rather than 1... I tried it It takes way longer. – kralco626 Aug 26 '10 at 11:02
0

This uses a union, combining all records of value B with all records that have only A values:

SELECT *
FROM mainTable
WHERE value_type = B
GROUP BY value_type UNION SELECT *
                          FROM mainTable
                          WHERE value_type = A
                               AND id NOT IN(SELECT *
                                             FROM mainTable
                                             WHERE value_type = B);
dvanaria
  • 6,593
  • 22
  • 62
  • 82
0

Try this (MSSQL).

Select id, value_typeB, null
from myTable
where value_typeB is not null
Union All
Select id, null, value_typeA
from myTable
where value_typeB is null and value_typeA is not null
AllenG
  • 8,112
  • 29
  • 40
0

Perhaps something like this:

select * from mytable
where id in (select distinct id where value_type = "B")
union
select * from mytable
where id in (select distinct id where value_type = "A" 
and id not in (select distinct id where value_type = "B"))
Alex Polkhovsky
  • 3,340
  • 5
  • 29
  • 37