1

I am trying to condense data in a database table which contains several instances of unique records with various column data.

I want to select the highest occurring value for each column for each particular unique record

But my SQL transaction is not working.

[dataBase1].[dbo].[table1] has several hundred thousand records with several columns (Name, Place, etc.).

[dataBase1].[dbo].[table2] has the list of unique Names from [table1] and the headers for the rest of the columns (Place, etc.) which are empty.

I tried the following code.

DECLARE @name varchar(max);
DECLARE @place varchar(max);

DECLARE db_cursor SCROLL CURSOR FOR 
     SELECT [Name] 
     FROM [dataBase1].[dbo].[table2];

OPEN HostName_cursor

FETCH NEXT FROM db_cursor INTO @name;

WHILE @@FETCH_STATUS = 0
BEGIN
     SELECT DISTINCT TOP(1) @place = [Place] 
     FROM [dataBase1].[dbo].[table1] 
     WHERE [Name] = @name 
       AND [Place] IS NOT NULL AND [Place] <> '' 
       AND (EXISTS  (SELECT [Place], COUNT (*) AS TOTAL 
                     FROM [dataBase1].[dbo].[table1] 
                     GROUP BY [Place])) 
     GROUP BY [Place];

     UPDATE [dataBase1].[dbo].[table2] 
     SET [Place] = @place 
     WHERE [Name] = @name;      

     SET @place = '';

     FETCH NEXT FROM db_cursor INTO @name
END

[Place] column for a particular unique [Name] has 53 values and the highest repeating Value count is 3. In essence, I want automate the following SQL transaction for each unique [Name].

SELECT DISTINCT TOP 1 
    [Place], COUNT (*) TOTAL 
FROM 
    [dataBase1].[dbo].[table1] 
WHERE 
    [Name] = 'xxxxxx' 
    AND [Place] IS NOT NULL AND [Place] <> '' 
GROUP BY [Place] 
ORDER BY TOTAL DESC;
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
JJCSZK
  • 11
  • 1
  • Note: DISTINCT + GROUP BY doesn't make sense. Remove DISTINCT. Also, your query doesn't care about ties. What if more than one place has equal, max count? It will randomly pick one. – Kamil Gosciminski Jan 18 '19 at 01:00
  • Going by your code I guess this is SQL Server. I'll change the tag. Can you please identify the exact version of SQL Server, edit the tags and add it. – Nick.Mc Jan 18 '19 at 01:02
  • If there is a tie, select the first in alphabetical order. – JJCSZK Jan 18 '19 at 01:16
  • I am using SQL Server 2017. – JJCSZK Jan 18 '19 at 01:18

1 Answers1

0

This can be done as a number of steps, each building on the next. You want to work on all names and places at once.

First, you want to get a count of how many of each name, place combo there are, so group by name and place, and count the places. Your query will look like this

SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place

Now, you need to find the one with the highest count, and the first alphabetically in case of ties. You can do this by doing a ROW_NUMBER on the results of the above, restarting the count (partitioning) on name, and ordering by placecount, then by place to resolve ties. Using a CTE (you could also do this as a subquery), this would look like

WITH places as (
  SELECT name, place, COUNT(place) as placecount
  FROM table1
  GROUP BY name, place
)
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places

If you look at that data, the place you want for any given name should be on the row with the RN of 1. So you can get the final data you're looking for with a query like this one

WITH places as (
  SELECT name, place, COUNT(place) as placecount
  FROM table1
  GROUP BY name, place
), orderplaces as (
  SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
  FROM places
)
Select name, place
FROM orderplaces
WHERE RN = 1

As you want to update table2 with this place data instead of viewing it, you would join to table2 in the final query and update, something like this

WITH places as (
  SELECT name, place, COUNT(place) as placecount
  FROM table1
  GROUP BY name, place
), orderplaces as (
  SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
  FROM places
)
UPDATE T2 set place = OP.place
FROM orderplaces OP
   INNER JOIN table2 T2 on T2.name = OP.name
WHERE RN = 1;
DancingFool
  • 1,247
  • 1
  • 8
  • 10