0

I'm trying to join 2 tables that have the common column 'NAME', but data is sorted like this:


TABLE A

NAME
B C Corporations
Tefal Inc.
West, Tom
Anne Zagabi
(C) NamyangSoy

TABLE B

NAME
BC Corporations
Tefal Inc
Tom West
AnneZagabi
( C ) NamyangSoy

The above are the cases that I came across. It's really ugly, BUT the one thing that made me think it may be possible with sql is that spelling of at least one word is the same in both tables.

However, I've tried soundex but the code is actually not in English so it didn't work (the above is just an example I made up in English). I've tried the difference function but it didn't work either (everything has value 4, I guess because it's not in English? I'm not sure).

I tried joining letter by letter but it didn't work either. I was hoping if there may be any other ways this could be done.. I'm using sqlcanvas, and the database is sybase. both tables have near 30 columns and ~12,000 rows each

user2810030
  • 63
  • 2
  • 8
  • you forgot to tag proper _RDBMS_ i guess its `sql-server` coz of `soundex` ?? – vhadalgi Dec 11 '13 at 12:52
  • 1
    If you look closer, it becomes more ugly - such tables will contain spelling mistakes, duplicates, almost duplicates and similar. I have once done similar work (merged different customer databases) - after you have removed spaces and made precise matches and then likely matches with soundex and removed some of these etc, you have to use something like excel and join remaining (about 20% of) records manually. I hope it is one-time task :) – Arvo Dec 11 '13 at 13:02
  • what output do you expect for these table @user2810030 ? – vhadalgi Dec 11 '13 at 13:07
  • @Arvo after 'manual' cleansing, I'm left with 8,607 which is like 60% but I haven't verified the result yet, and 8,607 is too much to do manually although this is a one-time task :( – user2810030 Dec 11 '13 at 13:28
  • @VijaykumarHadalgi my output will look like "name(char), loanID(num), loanAmt(num), loantype(char), ..." these 4 are the key columns I need, other columns are superfluous but needed for recordkeeping. Is this what you meant? – user2810030 Dec 11 '13 at 13:32
  • Aww man I know your pain, I had something like this at my work. Can you try and use the 'like' command and match say the first 5 characters and then run again matching the last 5 characters. It might get you closer. Good luck! – Bobby Dec 11 '13 at 15:06
  • how about a lateral join and use a function to check for similarity? – Neil McGuigan Dec 11 '13 at 20:39
  • @Bobby thanks, I tried doing that with 1, 2 characters, it works for some, doesn't for others. I think that is the best solution yet. how did you end up with your data, were you able to work it out? – user2810030 Dec 12 '13 at 02:39
  • @neil mcguigan but the part I'm stuck is 'use a function to check for similarity'.. :( – user2810030 Dec 12 '13 at 02:41
  • @user2810030 Our data was like this because we have 2 separate companies with some matching customers. Admin had typed in their names in differently between the two companies. I manually created a table to link the ids from both databases. I could do this as I only had around 30-40 customers to do. – Bobby Dec 13 '13 at 15:28

1 Answers1

1

Will something like this work for you?

select * from [Table A] a join [Table B] b On REPLACE(a.Name, ' ', '') = REPLACE(b.Name, ' ', '')

Use the Replace function to remove all spaces and compare the results.

For exampe, run this

 select
 CASE
 WHEN REPLACE('T  E  S  T', ' ', '') =  REPLACE('TE  ST', ' ', '')  THEN  'TRUE' else 'FALSE'
 END

Use code to replace all non-alpha characters:

CREATE FUNCTION [dbo].[fncRemoveNonAlphanumericChars](@Temp VarChar(1000))
RETURNS VarChar(1000)
AS
BEGIN
WHILE PatIndex('%[^A-Za-z0-9]%', @Temp) > 0
SET @Temp = Stuff(@Temp, PatIndex('%[^A-Za-z0-9]%', @Temp), 1, '')

RETURN @TEmp
END

Example:

SELECT dbo.fncRemoveNonAlphanumericChars('abc...DEF,,,GHI(((123)))456jklmn')

Result:

abcDEFGHI123456jklmn

(This was from here: http://jayhollingum.blogspot.com/2011/01/sql-server-remove-non-alphanumeric.html)

Joe
  • 26
  • 4
  • i think this together with like will do the trick, i'll refine the name column in both tables first and then join with a like command? i'm out at lunch, but i'll try it as soon as i get back thanks for this!! – user2810030 Dec 12 '13 at 04:34