R dplyr left join - multiple returned values and new rows: how to ask for the first match only?

Question

Let's say I have a list of suburb names, crime rate and their council names on a separate table.

Tables Picture

I know that left_join(table1, table2, by=Suburb) will return the table with newly added rows due to the multiple matches for council. The problem is that suburbs 3 and 4 overlap into two councils.

Is there a way to only get the left join to only return the first match only rather than creating new rows to facilitate for the extra ones?

In addition, on Table 2, is there a function to only keep the first row of each suburb and remove the second/third/fourth instances where the second/third/fourth council overlapping occurs?

Can you filter the table before joining ? This way you explicitly select what you want. — FlorianGD, Feb 24 '17 at 12:40
If the rows are duplicates, you can try using `distinct()` to remove multiple instances. — Megatron, Feb 24 '17 at 15:03

score 3 · Answer 1 · answered Nov 12 '17 at 01:24

3

You can do this using the plyr package and the join() function. The equivalent to left_join(table1, table2, by=Suburb) but only using the first Suburb match from table2 would be: join(table1, table2, by=Suburb, type="left", match="first"). I'm not sure what the equivalent is in the dplyr package, though I would love to know myself.

answered Nov 12 '17 at 01:24

Rebecca412

49
3

should be: by="Suburb" – b_g Jul 11 '18 at 19:51

R dplyr left join - multiple returned values and new rows: how to ask for the first match only?

1 Answers1

Linked