Remove duplicates SQL while ignoring key and selecting max of specified column

Question

I have the following sample data:

| key_id | name  | name_id | data_id |
+--------+-------+---------+---------+
|   1    | jim   |   23    |   098   |
|   2    | joe   |   24    |   098   |
|   3    | john  |   25    |   098   |
|   4    | jack  |   26    |   098   |
|   5    | jim   |   23    |   091   |
|   6    | jim   |   23    |   090   |

I have tried this query:

INSERT INTO temp_table
SELECT
DISTINCT @key_id,
name,
name_id,
@data_id FROM table1,

I am trying to dedupe a table by all fields in a row.

My desired output:

| key_id | name  | name_id | data_id |
+--------+-------+---------+---------+
|   1    | jim   |   23    |   098   |
|   2    | joe   |   24    |   098   |
|   3    | john  |   25    |   098   |
|   4    | jack  |   26    |   098   |

What I'm actually getting:

| key_id | name  | name_id | data_id  |
+--------+-------+---------+----------+
|   1    | jim   |   23    |   NULL   |
|   2    | joe   |   24    |   NULL   |
|   3    | john  |   25    |   NULL   |
|   4    | jack  |   26    |   NULL   |

I am able to dedupe the table, but I am setting the 'data_Id' value to NULL by attempting to override the field with '@'

Is there anyway to select distinct on all fields and while keeping the value for 'data_id'? I will take the highest or MAX data_id # if possible.

Do you care which key_id value you get? Let's say all rows had 098. Would you want id 1, 5, or 6 for jim? — AdamMc331, Dec 08 '14 at 23:43
@McAdam331 I don't mind at all if the key value gets changed. I just don't want to null out the field 'data_id'. — JDE876, Dec 09 '14 at 14:29
@McAdam331 I you code answers what I needed, however, I worded the question wrong. I still want unique 'data_ID' values per name or name_ids as this occurs in my situation. The difference in data_id, as with the key, was preventing me from omitting dupe names/name_ids. I also need to keep the 'ID' Column because my goal is to replace the original table with my results. Your idea helped me figure this out. — JDE876, Dec 09 '14 at 16:42
@McAdam331 its good, I decided to go with my answer which you helped me with! — JDE876, Dec 11 '14 at 19:52

score 0 · Answer 1 · answered Dec 09 '14 at 15:00

If you only want one row returned for a specific value (in this case, name), one option you have is to group by that value. This seems like a good approach because you also said you wanted the largest data_id for each name, so I would suggest grouping and using the MAX() aggregate function like this:

SELECT name, name_id, MAX(data_id) AS data_id
FROM myTable
GROUP BY name, name_id;

The only thing you should be aware of is the possibility that a name occurs multiple times under different name_ids. If that is possible in your table, you could group by the name_id too, which is what I did.

Since you stated you're not interested in the key_id but only the name, I just excluded it from the query altogether to get this:

| name  | name_id | data_id |
+-------+---------+---------+
| jim   |   23    |   098   |
| joe   |   24    |   098   |
| john  |   25    |   098   |
| jack  |   26    |   098   |

Here is the SQL Fiddle example.

score 0 · Accepted Answer · answered Dec 09 '14 at 16:34

0

RENAME TABLE myTable to Old_mytable,
myTable2 to myTable
INSERT INTO myTable
SELECT *
FROM Old_myTable
GROUP BY name, name_id;

This groups my tables by the values I want to dedupe while still keeping structure and ignoring the 'Data_id' column

answered Dec 09 '14 at 16:34

JDE876

407
1
5
16

Remove duplicates SQL while ignoring key and selecting max of specified column

2 Answers2