I don't quite understand how some of the groupings and summaries are built in R using dplyr package.
With the reproducible example below I'm trying to first group by (PN,GOT,HID) to count distinct instances of PC1. I then regroup by (PN,GOT) to sum over the distinct instances of PC1, based on the second grouping. This process seems to work for the total sums, except that for mean(TC) I get the mean of the entire data frame when I would expect to see the means by groupings of (PN,GOT). What am i missing to get those means of (PN,GOT), while not losing the sums over PC1 that I've built? I would appreciate some explanation of where I'm going wrong here.
PN<- c("Mazda","Mazda","Datsun","Hornet","Hornet","Valiant","Duster","Merc","Merc","Merc","Merc","Merc",
"Merc","Merc","Fiat","Honda","Toyota","Toyota","Dodge","AMC","Fiat")
GOT<- c("A","A","B","C","C","A","D","B","B","B","B","B","B","B","A","D","B","B","C","E","A")
HID<- c("Mazda_H1","Mazda_H1","Datsus_H1","Hornet_H1","Hornet_H2","Valiant_H1","Duster_H1","Merc_H1","Merc_H1","Merc_H1",
"Merc_H2","Merc_H2","Merc_H3","Merc_H4","Fiat_H1","Honda_H1","Toyota_H1","Toyota_H2","Dodge_H1","AMC_H1","Fiat_H1")
PIC<- c("BB","BB","BB","BB","AA","AA","AA","BA","BA","BA",
"AA","BB","BB","BB","BB","AA","AA","AA","BA","BA","BA")
TC <- c(110,110,93,175,175,105,245,62,62,62,62,62,62,62,33,52,97,97,150,150,33)
Int <- c(16.46,17.02,18.61,19.44,17.02,20.22,15.84,20.00,22.90,18.30,18.90,
17.40,17.60,18.00,19.47,18.52,19.90,20.01,16.87,17.30,18.90)
PC1<- c("", "","G1","C1","","G1","", "G1","G1","C1","C1","","","","Z1","Z1","Z1","Z1","","","G1")
df<-data.frame(PN,GOT,HID,PIC,TC,Int,PC1)
df
df%>% filter(PC1!="") %>%
group_by(PN, GOT, HID) %>%
summarize(new = n_distinct(PC1)) %>%
group_by(PN, GOT) %>%
mutate(TOT_new = sum(new),
meanTC = mean(TC))
I think the answer I'm looking for is something looking like this:
PN GOT HID TOT_new meanTC
<fctr> <fctr> <fctr> <int> <dbl>
1 Datsun B Datsus_H1 1 93
2 Fiat A Fiat_H1 2 33
3 Honda D Honda_H1 1 52
4 Hornet C Hornet_H1 1 175
5 Merc B Merc_H1 3 62
6 Toyota B Toyota_H1 2 97
7 Valiant A Valiant_H1 1 105
or at least this:
PN GOT HID new TOT_new meanTC
<fctr> <fctr> <fctr> <int> <int> <dbl>
1 Datsun B Datsus_H1 1 1 93
2 Fiat A Fiat_H1 2 2 33
3 Honda D Honda_H1 1 1 52
4 Hornet C Hornet_H1 1 1 175
5 Merc B Merc_H1 2 3 62
6 Merc B Merc_H2 1 3 62
7 Toyota B Toyota_H1 1 2 97
8 Toyota B Toyota_H2 1 2 97
9 Valiant A Valiant_H1 1 1 105