14

Added: Working with SQL Server 2000 and 2005, so has to work on both. Also, value_rk is not a number/integer (Error: Operand data type uniqueidentifier is invalid for min operator)

Is there a way to do a single column "DISTINCT" match when I don't care about the other columns returned? Example:

**Table**
Value A, Value L, Value P
Value A, Value Q, Value Z

I need to return only one of these rows based on what is in the first one (Value A). I still need results from the second and third columns (the second should actually match all across the board anyway, but the third is a unique key, which I need at least one of).

Here's what I've got so far, although it doesn't work obviously:

SELECT value, attribute_definition_id, value_rk
FROM attribute_values
WHERE value IN (
    SELECT value, max(value_rk)
    FROM attribute_values
)
ORDER BY attribute_definition_id

I'm working in ColdFusion so if there's a simple workaround in that I'm open to that as well. I'm trying to limit or "group by" the first column "value". value_rk is my big problem since every value is unique but I only need one.

NOTE: value_rk is not a number, hence this DOES NOT WORK

UPDATE: I've got a working version, it's probably quite a bit slower than a pure SQL version, but honestly anything working at this point is better than nothing. It takes the results from the first query, does a second query except limiting it's results to one, and grabs a matching value_rk for the value that matches. Like so:

<cfquery name="queryBaseValues" datasource="XXX" timeout="999">
    SELECT DISTINCT value, attribute_definition_id
    FROM attribute_values
    ORDER BY attribute_definition_id
</cfquery>

<cfoutput query="queryBaseValues">
    <cfquery name="queryRKValue" datasource="XXX">
        SELECT TOP 1 value_rk
        FROM attribute_values
        WHERE value = '#queryBaseValues.value#'
    </cfquery>
    <cfset resourceKey = queryRKValue.value_rk>
    ...

So there you have it, selecting a single column distinctly in ColdFusion. Any pure SQL Server 2000/2005 suggestions are still very welcome :)

James A Mohler
  • 11,060
  • 15
  • 46
  • 72
Organiccat
  • 5,633
  • 17
  • 57
  • 104
  • Can you clarify what you need? Any row, but only one row per value? The row with the value that has maximum value_rk? I'm not sure I understand what you're going for here. – tvanfosson Oct 30 '08 at 18:49
  • > NOTE: value_rk is not a number, hence this DOES NOT WORK. That has nothing to do with why your query fails. You haven't mentioned the RDBMS you're using but in Oracle you can use MAX on character columns. –  Oct 30 '08 at 18:55
  • MS SQL can also use MAX on non-numeric columns. – BradC Oct 30 '08 at 18:56
  • If you don't care about the other columns, what use do they serve? – Tom H Oct 30 '08 at 19:00
  • The value_rk column has a unique identifying key that links to another table. I need one of those values off that table that match. I don't care which one since most of them are the same, but I do need one of them. Occasionally one is different or blank, but this is rare enough to not worry about – Organiccat Oct 30 '08 at 19:01
  • Using TOP is not reliable unless you also use ORDER BY. SQL gives no guarantee that the rows will be returned in any order, unless you specify the order. – Bill Karwin Oct 30 '08 at 19:17
  • Are value and attribute_definition_id numeric? – John Fiala Oct 30 '08 at 23:37
  • I still think that you should have defined rules about which values you want. I'm always suspicious when someone tells me that they "don't care" about which row they get. If you don't want blanks, you should program for that. If you want the oldest or latest, that's what you should get. – Tom H Oct 31 '08 at 13:38
  • Unfortunately there was no real way to tell which was the oldest or newest, the database was not set up that way (I didn't have a choice in the matter). The project piece was due yesterday and I got it finished which, while not ideal, was better than not delivering at all :/ – Organiccat Oct 31 '08 at 14:21

11 Answers11

11

this might work:

SELECT DISTINCT a.value, a.attribute_definition_id, 
  (SELECT TOP 1 value_rk FROM attribute_values WHERE value = a.value) as value_rk
FROM attribute_values as a
ORDER BY attribute_definition_id

.. not tested.

walming
  • 392
  • 1
  • 3
  • 14
9
SELECT a1.value, a1.attribute_definition_id, a1.value_rk
FROM attribute_values AS a1
  LEFT OUTER JOIN attribute_values AS a2
    ON (a1.value = a2.value AND a1.value_rk < a2.value_rk)
WHERE a2.value IS NULL
ORDER BY a1.attribute_definition_id;

In other words, find the row a1 for which no row a2 exists with the same value and a greater value_rk.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • This is a great solution because it will work with most (all?) RDBMS's. I've been spoiled by PostgreSQL's `DISTINCT ON` which is simple to type, but nonstandard and not implemented in SQL Server, among others. – Sam Jan 30 '14 at 17:38
8

This should work for PostgreSQL, i don't know which dbms you use.

SELECT DISTINCT ON (value)
  value, 
  attribute_definition_id, 
  value_rk
FROM 
  attribute_values
ORDER BY
  value, 
  attribute_definition_id

PostgreSQL Docs

Patryk Kordylewski
  • 1,263
  • 1
  • 7
  • 11
2

Is this what you're looking for?

SELECT value, attribute_definition_id, value_rk
FROM attribute_values av1
WHERE value_rk IN (
        SELECT max(value_rk)
        FROM attribute_values av2
        WHERE av2.value = av1.value
)
ORDER BY attribute_definition_id

If value_rk is unique, this should work.

gfrizzle
  • 12,419
  • 19
  • 78
  • 104
2

Okay, here's my assumptions:

Standard SQL Server

value_rk is not a numeric value, but value and attribute_definition_id are numeric.

SELECT value_rk, MIN(value) as value, MIN(attribute_definition_id) as attribute_definition_id
FROM attribute_values
GROUP BY value_rk
ORDER BY MIN(attribute_definition_id)

If one of those fields isn't numeric, then it'll require more thought - please let us know.

John Fiala
  • 4,561
  • 3
  • 30
  • 26
2

If you are open to using table variables, you could keep it all within a single database call like this:

DECLARE @attribute_values TABLE (value int, attribute_definition_id int, value_rk uniqueidentifier)

INSERT INTO @attribute_values (value)
SELECT DISTINCT value FROM attribute_values

UPDATE @attribute_values
SET attribute_definition_id = av2.attribute_definition_id,
    value_rk = av2.value_rk
FROM @attribute_values av1
INNER JOIN attribute_values av2 ON av1.value = av2.value

SELECT value, attribute_definition_id, value_rk FROM @attribute_values

Essentially you are creating a limited recordset with the table filled with unique values of 'value', and letting SQL Server fill in the gaps using just one of the matches from the main table.

Edited to add: This syntax works within cfquery just fine.

Dane
  • 9,737
  • 5
  • 28
  • 23
1
SELECT value, attribute_definition_id, value_rk
FROM attribute_values
WHERE value, value_rk IN (
        SELECT value, max(value_rk)
        FROM attribute_values
        GROUP BY value
)
ORDER BY attribute_definition_id

NOT TESTED!

Chris Cudmore
  • 29,793
  • 12
  • 57
  • 94
  • My fault, I posted without testing myself, I updated that value_rk is not a number (max/min won't work on it) – Organiccat Oct 30 '08 at 18:49
  • max/min will work on varchar values in SQL2005. What db are you using? – BradC Oct 30 '08 at 18:52
  • Actually testing on 2005, but the production server is 2000. I get a data type mismatch error when trying to run min/max on it (added to top of OP post). – Organiccat Oct 30 '08 at 18:56
  • OP post is like PIN number or SSN number –  Oct 30 '08 at 18:58
  • It fails to work in your scenario because IN expects a single column list of values. WHERE foo in (SELECT bar, max(baz) ... is wrong no matter if baz is a number or not. –  Oct 30 '08 at 19:02
  • I tried moving the max out the top select with the same result. For some reason, having the max in the WHERE portion didn't actually throw an error, but didn't affect the results either way. – Organiccat Oct 30 '08 at 19:14
1

I'm not sure if I entirely understand your set-up, but would something like this work:

SELECT value, attribute_definition_id, value_rk
FROM attribute_values
GROUP BY value
ORDER BY attribute_definition_id;

Again, I'm not real sure which column it is you're trying to limit, or how you're wanting to limit it.

Adam
  • 7,067
  • 2
  • 27
  • 24
0

As noted by John Fiala, the canonical answer in SQL server is to use a group by clause when you want to perform a "distinct" operation over a subset of columns. Why is this the correct canonical answer? Well, you want to pull in columns that are not part of your "distinct" group. Exactly what rows do you want to pull in for these subsidiary columns? Using a group by clause and defining aggregate functions for these subsidiary columns makes your query well-behaved in the sense that you now know how these subsidiary columns are obtained. This article gives more details:

http://weblogs.sqlteam.com/jeffs/archive/2007/10/12/sql-distinct-group-by.aspx

SELECT value_rk, MIN(value) as value, 
MIN(attribute_definition_id) as attribute_definition_id
FROM attribute_values
GROUP BY value_rk

Also, it's worth noting that MIN and MAX work on text and several other data types that are not numeric values.

Corwin Joy
  • 645
  • 7
  • 14
0

Less elegant than I would like---- it's essentially what you're doing, just in pure SQL--- but it works and can all be done in SQL.

DECLARE @mytable TABLE(mykey NVARCHAR(512), myVal NVARCHAR(512))

DECLARE @keyVal NVARCHAR(512)
DECLARE @depVal NVARCHAR(512)
DECLARE myCursor CURSOR for
   SELECT DISTINCT(value) FROM attribute_values
OPEN myCursor
FETCH NEXT FROM myCursor INTO @keyVal
WHILE @@FETCH_STATUS=0
  BEGIN
     SET @depVal = (SELECT TOP 1 attribute_definition_id FROM attribute_values WHERE VALUE=@keyVal ORDER BY attribute_definition_id)
     INSERT INTO @mytable (mykey, myVal) VALUES (@keyVal, @depVal)
     FETCH NEXT FROM myCursor INTO @keyVal
  END
DEALLOCATE myCursor

SELECT * FROM @mytable

You can add a depVal2 and others using this method.

matt.mercieca
  • 853
  • 1
  • 6
  • 13
0

i think

SELECT DISTINCT a.value, a.attribute_definition_id, 
(SELECT TOP 1 value_rk FROM attribute_values WHERE value = a.value) as value_rk
FROM attribute_values as a
ORDER BY attribute_definition_id

worked

David M
  • 71,481
  • 13
  • 158
  • 186
user1133937
  • 33
  • 1
  • 1
  • 3