1

What I need is something like select * from table group by * having count(*) > 1 which doesn't work on mssql2008r2 and postgresql. The query runs a few times a week on different tables. The question has basically been posed before.

Because there has been confusion as to why someone would ever need this when there is DISTINCT: I need the query in order to identify identical rows in a table without primary key. I know this is not a regular case but this is exactly what I have to document and report.

Grouping by a subquery that returns all table columns and is formatted as a string doesn't work:

select * from table group by (select column_name + ', ' as data() from information_schema.columns where table_name = 'table' for xml path(''))

Return error msg. 144.

EDIT: When writing the query I don't know the column names. So I need a generic query without specifying the column names.

EDIT2: The proposed dynamic sql code works fine, but is not suited in my case, as it would be easier to create the GROUP BY String in my external code. So I'm still looking for a way to solve this with one query.

Community
  • 1
  • 1
Juergen
  • 312
  • 3
  • 18
  • If you replace * with all table columns, it will work. I didn't get your problem? Describing all columns? – gustavodidomenico Oct 23 '14 at 11:54
  • Exactly. I don't know all columns in advance. I edited the question. And as I said querying all columns in a subquery and concatenating as a String does not work. – Juergen Oct 23 '14 at 11:57
  • How many tables do you have to worry about and how often does this have to run? Sounds like a one-off exercise so why not just bite the bullet and write a bunch of SELECT... statements? – DavidG Oct 23 '14 at 12:01
  • It is for a data quality analysis tool and runs a few times a week. Every time on a new data set, so it has to be generic. I could split it into two queries, instead of the subquery mentioned above, but I don't have this overhead in any of my other ~140 queries. And these are in my opinion much more complex. – Juergen Oct 23 '14 at 12:06
  • 2
    I don't see a way to achieve this without dynamic SQL or external code. Do you really need to know the count or are you just trying to remove duplicates? – DavidG Oct 23 '14 at 12:07
  • I need the count as well as the actual rows that are duplicates. The tables remain untouched once imported, so I don't need to delete anything. You also don't see a way to create the GROUPY BY String as I tried above, do you? – Juergen Oct 23 '14 at 12:12

1 Answers1

2

You can do this using dynamic SQL. Here is an example wrapped in a stored procedure:

CREATE PROCEDURE GetDistinctRowsWithCount
(
    @table VARCHAR(255)
)

AS

DECLARE @columns VARCHAR(8000) = ''

SELECT @columns = @columns + '[' + name + ']' + ','
FROM sys.columns
WHERE object_id = OBJECT_ID(@table)

DECLARE @sql NVARCHAR(MAX) = 'SELECT ' + @columns + 'COUNT(*) FROM ' + @table + ' GROUP BY ' + LEFT(@columns, LEN(@columns)-1)

EXEC sp_executesql @sql

Or if you prefer to only see duplicated rows:

CREATE PROCEDURE GetDuplicateRowsWithCount
(
    @table VARCHAR(255)
)

AS

DECLARE @columns VARCHAR(8000) = ''

SELECT @columns = @columns + '[' + name + ']' + ','
FROM sys.columns
WHERE object_id = OBJECT_ID(@table)

DECLARE @sql NVARCHAR(MAX) = 'SELECT ' + @columns + 'COUNT(*) FROM ' + @table + ' GROUP BY ' + LEFT(@columns, LEN(@columns)-1) + ' HAVING COUNT(*) > 1'

EXEC sp_executesql @sql

And to call the procedure, just pass in the table name:

EXEC dbo.GetDistinctRowsWithCount 'table'
DavidG
  • 113,891
  • 12
  • 217
  • 223
  • That is the answer to my question in the comment, so +1 for that (as soon as I got 15 rep). But that's not what I want in this case, as it would be easier to do in my external code. So I leave the question marked as unanswered and hope for answers to solve the problem with one query. – Juergen Oct 23 '14 at 12:36
  • In what way does it not answer the question? – DavidG Oct 23 '14 at 12:44
  • It actually does answer it the way I posed it. But doing anything using dynamic sql is more complicated than doing in my external code. So I have to be more specific in my question. – Juergen Oct 23 '14 at 12:50
  • 1
    I'm pretty (99%) sure that you can only do what you want with dynamic SQL unless you explicitly write all ~140 queries. – DavidG Oct 23 '14 at 12:52
  • You mean the 140 queries in my tool? The queries are stored with further information (description etc.) in a table, manipulated by the tool and user using parameterized prepared statements and then executed. No dynamic sql, although it might be easier, I don't know. It's just the way it's always been. – Juergen Oct 23 '14 at 12:58