0

I've checked stackoverflow as well as google but could not find any solution. What would be the proper way to remove duplicate entries from an nvarchar field that contains json string in SQL Server? For my case, let say I have nvarchar 'People' field on my table which contains the following data.

[
 {
  "name":"Jon",
  "age": 30
 },
 {
  "name":"Bob",
  "age": 30
 },
 {
  "name":"Nick",
  "age": 40
 },
 {
  "name":"Bob",
  "age": 40
 }
]

I need to remove the entries which has duplicate names which would be the 'Bob' in that case. So after executing the query I am expecting this result

[
 {
  "name":"Jon",
  "age": 30
 },
 {
  "name":"Bob",
  "age": 30
 },
 {
  "name":"Nick",
  "age": 40
 }
]

What would be the proper sql query to do that? Actually I am trying to achieve no duplicate names rather than no duplicate entries. That's why 2 Bobs have different ages in the above example. More specifically I need to keep only first items among duplicates for this example the first Bob with age 30.Using ROW_NUMBER() and Partition By would be solution but it breaks the existing order.I need to achieve this without breaking the existing order. So I have the table with Id and PeopleJson fields. The following query would achieve what I want to achieve but it breaks the order in PeopleJson

SELECT Id, (
    SELECT [Name],[Age] FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY (select NULL)) as row_num
        FROM OPENJSON(PeopleJson) WITH ([Name] NVARCHAR(1000), [Age]  int)
    ) t WHERE t.row_num = 1
    FOR JSON PATH, INCLUDE_NULL_VALUES
) as [People]
 From [TestTable] 
Huso
  • 67
  • 7

2 Answers2

0
WITH cte AS (
    SELECT DISTINCT value
    FROM OPENJSON(@json)
)
SELECT * FROM cte
DECLARE @json NVARCHAR(MAX) = N'[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, {"id": 1, "name": "Alice"}]'

SELECT (
    SELECT *
    FROM (
        SELECT DISTINCT value
        FROM OPENJSON(@json)
    ) cte
    FOR JSON AUTO
) AS result
[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"}]
D M
  • 5,769
  • 4
  • 12
  • 27
  • I need to eliminate duplicates based on a column value not the whole row value. If the 3rd 'Alice' will have different id then it won't eliminate the duplicates – Huso Dec 19 '22 at 18:10
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 21 '22 at 03:03
-1

Can you provide some more information, please? I understand what you're trying to do, but I have some questions.

Each Bob has a different age, so those aren't duplicate entries, only duplicate names. Either way, it would be hard to decide which entry to remove if each one is different.

You can achieve no duplicate "Bob" entries, but the issue comes in when deciding which Bob record you want to keep.

DayByDay
  • 60
  • 6
  • Actually I am trying to achieve just no duplicate names rather than no duplicate entries – Huso Dec 19 '22 at 18:13
  • I am trying to keep first Bob – Huso Dec 19 '22 at 18:14
  • Can you query the whole table and deal with the duplicate Bobs in your controller? That might be easier that doing it in a SQL query. For example, your return all People and then loop through those people and when you hit the duplicates, don't use that entry. – DayByDay Dec 19 '22 at 18:18
  • Unfortunately I have to do this on T-SQL – Huso Dec 19 '22 at 18:19
  • In that case, I think you might find your answer here: https://stackoverflow.com/questions/4662464/how-to-select-only-the-first-rows-for-each-unique-value-of-a-column – DayByDay Dec 19 '22 at 18:26
  • More of a comment than an answer – Mark Schultheiss Dec 19 '22 at 18:54