2

New to R. I'm developing an entity resolution algorithm using the RecordLinkage package. I've had pretty good success so far - using dedup, I end up with a data frame, two columns of which are keys of matched records, as below:

x <- list(key1 = c(1,1,2,2,3,3,3,4,5,6))
y <- list(key2 = c(3,4,5,6,4,8,9,7,10,11))
df <- data.frame(key1 = x, key2 = y)
df
     key1 key2
1     1    3
2     1    4
3     2    5
4     2    6
5     3    4
6     3    8
7     3    9
8     4    7
9     5   10
10    6   11

Trying to figure out how to end up one row for each entity with a json string containing all the keys for that entity. Such as:

               entity_keys
1 {"awkeys":"1,3,4,8,9,7"}
2 {"awkeys":"2,5,6,10,11"}

I'm using toJSON from rjson to generate the string - the tough part is how to compile the list of keys. Am assuming transitive matching here (ex. if 1 matches 3 and 3 matches 8, then 1 matches 8).

Am sure there's a snazzy R way to do this but don't know what that would be. Any help is appreciated.

  • You want to create a graph - see [this answer](http://stackoverflow.com/a/32378377/3760920) from earlier today - you are just one step in. Then output the `comp` object as JSON – jeremycg Sep 03 '15 at 17:38
  • Thanks for that - i had to change (comp <- split(nodes, components(g)$membership)) to (comp <- split(nodes, clusters(g)$membership)) to get it to work on my end, but, it looks like that'll get it done. – Aaron McLendon Sep 03 '15 at 20:00

0 Answers0