I have a data frame puzzle
of customers and the type of item they own. A customer can occur multiple times on the list if he has several items.
name type
m1 A
m10 A
m2 A
m9 A
m9 B
m4 B
m5 B
m1 C
m2 C
m3 C
m4 C
m5 C
m6 C
m7 C
m8 C
m1 D
m5 D
I would like calculate what percentage of people who own "A", also own "B", and so on.
Based on the above input, how can I get an output like this using R:
A B C D TOTAL
A 1 0.25 0.5 0.25 4
B 0.33 1 0.67 0.33 3
C 0.25 0.25 1 0.25 8
D 0.5 0.5 1 1 2
Thanks a lot for your help!
Here is the long and manual way to do it, with no looping or advanced functions whatsoever (but of course that is wasted potential in R):
Example for item A:-
puzzleA <- subset(puzzle, type == 'A')
Calculating customers who own A, who also own B:-
length(unique((merge(puzzleA, puzzleB, by = 'name'))$name))/length(unique(puzzleA$name)
Data
puzzle <- structure(list(name = c("m1", "m10", "m2", "m9", "m9", "m4",
"m5", "m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8", "m1", "m5"
), type = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C",
"C", "C", "C", "C", "C", "D", "D")), .Names = c("name", "type"
), class = "data.frame", row.names = c(NA, -17L))