SQL: Distinct and OrderBy issue

Question

I'm currently working on a query that should return all the rows from CONCENTRATOR table. However, it has to be sortable by all concentrator's columns, as well as department name and type name.

Here are the concentrator's columns :

CONCENTRATOR_ID
NAME
INTERNALADDRESS
TYPE_ID
DEPARTMENT_ID

TYPE_ID and DEPARTMENT_ID are linked to respectively DEPARTMENT table and TYPE table, with both a NAME column.

Here are the constraints :

concentrators are sortable by id, name, address, type's name and department's name
distinct department names (if a same concentrator has 2 department, return only one row)

To resume, I would need something like SELECT * on concentrator columns, and DISTINCT department.name as well but seems complicated... I tried lots of requests but couldn't find any one working. Could anybody help me please ?

The request I'm looking for should be something like this:

SELECT DISTINCT d.NAME as "department.name", t.NAME as "type.name", *
FROM "CONCENTRATOR" c
LEFT OUTER JOIN "CONCENTRATOR_GROUP" USING(CONCENTRATOR_ID)
LEFT OUTER JOIN "GROUP" g USING(GROUP_ID)
LEFT OUTER JOIN "TYPE" t USING(TYPE_ID)
LEFT OUTER JOIN "DEPARTMENT" d USING(DEPARTMENT_ID)
ORDER BY TRIM(UPPER(c.name)) ASC

And can you confirm MS SQL-Server is the RDBMS? Your join's don't look like SQL-Server syntax. — Dave C, Jul 04 '14 at 14:21
Sorry my mistake, this is Oracle not SQL Server, I edited the tag. About sample data are you talking about the result of this query ? — Flash_Back, Jul 04 '14 at 14:42
The `OUTER` keyword is optional and can be omitted with `LEFT JOIN`. — Kermit, Jul 04 '14 at 14:49
What is wrong with the query you have - well, it will error because you have `*` instead of `c.*`, and `using()` rather than `on` might give you issues. Maybe you can set up an SQL Fiddle with those tables and some data in them, your current best attempt at a query; and add to the question what you want the output to be for that data. It's hard to tell what you really need at the moment. If a concentrator has two departments, how will you choose which to show? — Alex Poole, Jul 04 '14 at 17:57
Just an _important_ note, `WHEN` ordering a `DISTINCT`-ed resultset, the ordering column/expression should be part of `SELECT` as well.. Please see my explanation [here](http://stackoverflow.com/questions/21327286/how-to-order-by-case-insensitive-asc-or-desc-with-distinct-and-union/21328754#21328754) — Maheswaran Ravisankar, Jul 04 '14 at 18:12

Paul Maxwell · Accepted Answer · 2014-07-05T08:24:24.917

There are a few things to note here. I'm really not fond of "natural joins" as they simply disguise useful detail in my view, so I have not used them. I had to assume that the table "GROUP" joins via CONCENTRATOR_GROUP for an example of that missing detail.

The table name "GROUP" isn't a great idea as it is a very commonly used reserved word. I'd not recommend using such a word as a table name. Due to this "GROUP" is quoted (it isn't normal to quote object names in Oracle my experience).

You talk about "distinct" as if it has some magical quality that I should intuitively understand. It doesn't, and I don't. Let's say there are just 2 departments both are also "distinct"

DeptX DeptY

So now let's assume there are 2 concentrators, both of these are "distinct" too:

ConcenA ConcenB

Both concentrators are used in both departments, so we produce this query:

select distinct 
c.name as c_name, d.name as d_name
from concentrators c 
inner join departments d on c.dept_id=d.dept_id 

The result is:

ConcenA DeptX
ConcenB DeptX
ConcenA DeptY
ConcenB DeptY

All 4 rows are "distinct"

The point is that "select distinct" is a "row operator", i.e. it considers the entire row to determine if any part of the row is different to all other rows. There are no subtleties or options to "select distinct", it always works the same way (over the entire row). So, with this in mind, we now know that "select distinct" simply is not going to be the right technique (and due to the technical definition of distinct you might also sense it isn't a good way to describe your problem either).

So, as "select distinct" isn't the right technique typically one can turn to these as techniques: "group by" or "row_number()" because these do give us subtleties and options.

Now you haven't explained why or how you would choose just one department (in fact, to me, it sounds weird you would choose just one) but below I offer you A way to do this using row_number() and the "subtlety" being used is the ORDER BY which gives the number 1 to the first Department Name in alphabetic order, all other departments get more than 1; and this occurs for each CONCENTRATOR_ID because row_number() is "partitioned by" that field.

    SELECT
      department_name
    , type_name
    , NAME
    , CONCENTRATOR_ID
    , INTERNALADDRESS
    , TYPE_ID
    , DEPARTMENT_ID
FROM (

            SELECT
                  d.NAME                           AS department_name
                , t.NAME                           AS type_name
                , c.CONCENTRATOR_ID
                , c.NAME
                , c.INTERNALADDRESS
                , c.TYPE_ID
                , c.DEPARTMENT_ID
                , ROW_NUMBER() OVER (PARTITION BY c.CONCENTRATOR_ID
                                     ORDER BY d.NAME, t.NAME, c.NAME) AS RN
            FROM CONCENTRATOR c
                  LEFT OUTER JOIN CONCENTRATOR_GROUP cg
                        ON c.CONCENTRATOR_ID = cg.CONCENTRATOR_ID
                  LEFT OUTER JOIN "GROUP" g
                        ON cg.GROUP_ID = g.GROUP_ID
                  LEFT OUTER JOIN TYPE t
                        ON c.TYPE_ID = t.TYPE_ID
                  LEFT OUTER JOIN DEPARTMENT d
                        ON c.DEPARTMENT_ID = c.DEPARTMENT_ID
      ) sq
WHERE RN = 1 /* HERE is where we restrict output to one department per concentrator */
ORDER BY
      NAME ASC
    , CONCENTRATOR_ID
;

I have no reason to change the type of joins as you can see they remain as left outer joins - but I suspect there may be no valid reason for all or some of these. Do use the more efficient INNER JOIN if you can.

Thank you for the complete explanation and for the code, seeing it I just could not have done this by myself. You're absolutely right regarding the reserved works, however since I'm currently working on an existing app, I don't have the chance to change table names. The same way, quoted table names are actually really inconvenient but I have to do with it. I also changed joins to INNER JOIN as you suggested. Anyway thank you so much, I accepted your answer since this is working perfectly fine ! — Flash_Back, Jul 07 '14 at 08:19

SQL: Distinct and OrderBy issue

1 Answers1