0

Hi guys I know there are few questions related to joining tables in R, I tried most of them but they didn't work, in my case, I have two tables first one (A) has two columns (Id and company_name), 70,000 rows, and the second one (B) has company_name, registration number .....etc with 10,000,000 rows.

I want to join them to get the registration number for all the companies in A.

hint, I tried to merge, inner join using sqldf,fuzzyjoin and vlookup in excel but it didn't return all the record in A.

zx8754
  • 52,746
  • 12
  • 114
  • 209

2 Answers2

2

It's hard to give an answer without any example code or data but let me try.

I can see three possible problems here.

1: Do you want to have all entries from A in your result set? even if there were no match from B? then you should use a left outer join. for example with merge you should have merge(A,B,by="company_name",all.x = TRUE)

2: As I can see you want to join on company_name, is the class character in both A and B? sometimes it is converted for some reason to factor when you read in the file. convert them to character with A$company_name = as.character(A$company_name) and do the same for B

3: maybe there are extra characters in the names (like whitespaces) or special characters. If it's only that you can remove them and do the merging after that. use this: gsub('[^[:alnum:]]','',tolower(iconv(A$company_name, to='ASCII//TRANSLIT'))). This will make sure you get rid off all unwanted stuff.

p130ter
  • 86
  • 7
  • thank you so much for your answer, I think I should have made more clear my fault, yes I want entries from A because I am 100% sure that all the companies in A are in B. I have converted both company_name columns into Characters and tried merge function, join using SQLDF and fuzzyjoin package –  Dec 06 '17 at 11:48
  • Can you present an example where it should work but it's not? that way it will be much easier to find a solution – p130ter Dec 06 '17 at 12:23
  • this is one of the companies in A One Stop Shop and its the same in B One Stop Shop but when I join both tables I got Null –  Dec 06 '17 at 13:56
  • str(B$CompanyName) chr [1:1048575] "LTD" "BIG IMPACT GRAPHICS LIMITED" "NNOV8 LIMITED" "NSPIRED LTD" "NVERTD DESIGNS LIMITED" "OBAC LIMITED" ... > str(A$name) chr [1:57308] "Cantalis Ltd" "One Stop Shop" "EJ FINANCIAL LIMITED" "Mr A J Jones & Mrs S Al-Shamma" ... –  Dec 06 '17 at 14:20
  • I don't see _One Stop Shop_ in B, but! for me it looks like everything in B is uppercase but not in A. use tolower() for `company_name` in both tables – p130ter Dec 06 '17 at 15:01
0

R

merge(table_a, table_b, by = "company_name", all.x = TRUE)

Excel

A: Id from table1 .
B: company_name from table1 .
C: company_name from table2 .
D: registration_number from table2 .

Use this formula in any other column to get registration_number

Vlookup(B2, $C$2:$D$10000001, 2, FALSE)
penguin
  • 1,267
  • 14
  • 27
  • I have tried both cases but before and the didn't work –  Dec 06 '17 at 11:49
  • Did you check for case mismatch and column datatypes mismatch in R? – penguin Dec 06 '17 at 12:35
  • yes I did both columns are characters type –  Dec 06 '17 at 13:54
  • str(B$CompanyName) chr [1:1048575] "LTD" "BIG IMPACT GRAPHICS LIMITED" "NNOV8 LIMITED" "NSPIRED LTD" "NVERTD DESIGNS LIMITED" "OBAC LIMITED" ... > str(A$name) chr [1:57308] "Cantalis Ltd" "One Stop Shop" "EJ FINANCIAL LIMITED" "Mr A J Jones & Mrs S Al-Shamma" ... –  Dec 06 '17 at 14:19