New to R. I'm developing an entity resolution algorithm using the RecordLinkage package. I've had pretty good success so far - using dedup, I end up with a data frame, two columns of which are keys of matched records, as below:
x <- list(key1 = c(1,1,2,2,3,3,3,4,5,6))
y <- list(key2 = c(3,4,5,6,4,8,9,7,10,11))
df <- data.frame(key1 = x, key2 = y)
df
key1 key2
1 1 3
2 1 4
3 2 5
4 2 6
5 3 4
6 3 8
7 3 9
8 4 7
9 5 10
10 6 11
Trying to figure out how to end up one row for each entity with a json string containing all the keys for that entity. Such as:
entity_keys
1 {"awkeys":"1,3,4,8,9,7"}
2 {"awkeys":"2,5,6,10,11"}
I'm using toJSON from rjson to generate the string - the tough part is how to compile the list of keys. Am assuming transitive matching here (ex. if 1 matches 3 and 3 matches 8, then 1 matches 8).
Am sure there's a snazzy R way to do this but don't know what that would be. Any help is appreciated.