5

While working with some legacy data, I want to group the data on a column ignoring spelling mistakes. I think SOUNDEX() could do the job to achieve the desired result. Here is what I tried:

SELECT soundex(AREA)
FROM MASTER
GROUP BY soundex(AREA)
ORDER BY soundex(AREA)

But (obviously) the SOUNDEX returned 4-character code in result rows like this, loosing actual strings:

A131
A200
A236

How could I include at least one occurrence from the group into the query result instead of 4-character code.

Khadim Ali
  • 2,548
  • 3
  • 33
  • 61

2 Answers2

5
SELECT soundex(AREA) as snd_AREA, min(AREA) as AREA_EXAMPLE_1, max(AREA) as AREA_EXAMPLE_2
from MASTER
group by soundex(AREA)
order by AREA_EXAMPLE_1
;

In MySQL you could select group_concat(distinct AREA) as list_area to get all the versions, and I don't know about that in SQL-Server, but min and max give two examples of the areas, and you wanted to discard the diffs anyway.

flaschenpost
  • 2,205
  • 1
  • 14
  • 29
  • It worked but with astonishing results. For e.g., the soundex() is grouping the NORTH NAZ and NORTH KAR, P.I.B. and P.E.C.H, SHAH FAISAL and S.I.T.E as same. ??? !!!! – Khadim Ali May 08 '13 at 14:54
  • Maybe you could control the soundex- function in SQL-Server. I only have access to MySQL and there I get soundex("NORTH NAZ") = N6352, soundex("NORTH KAR" = N6326. Or maybe there are even different soundex-functions availiable. – flaschenpost May 08 '13 at 15:00
  • I tried removing spaces from the string (NORTHNAZ and NORTHKAR), but actually the SOUNDEX() is matching too loosely. For example, following strings produce same SOUNDEX codes; <<< select SOUNDEX(col1), col1 from ( select 'GULBAHAR' as col1 union ALL select 'GULBERG' ) t >>> ** <<< select SOUNDEX(col1), col1 from ( select 'GULSHANEIQBAL' as col1 union ALL select 'GULSHANEMAYMAR' ) t >>> ** Any idea, or I have to create new thread for this another issue? – Khadim Ali May 09 '13 at 08:31
5

You could also use row_number() to get one row for each soundex(area) value:

select AREA, snd
from
(
  select AREA, soundex(AREA) snd,
    row_number() over(partition by soundex(AREA)
                      order by soundex(AREA)) rn
  from master
) x
where rn = 1

See SQL Fiddle with Demo

Taryn
  • 242,637
  • 56
  • 362
  • 405