I have the following dataset:
ID <- c(1,1,1,1,1,1,1,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
"yellow","red","red","blue","green")
df <- data.frame(ID,color)
I wish to have: the "color" column to only contain the distinct colors. So ID 1 has 7 observations with repeated colors but I want it to show just the distinct colors, so that ID 1 would have only 3 observations because it only has 3 distinct colors. etc
ID <- c(1,1,1,2,2,2,2)
n_color <- c(3,3,3,4,4,4,4)
color <- c("red","blue","green",
"yellow","red","blue","green")
df <- data.frame(ID,n_color,color)
I know I can use the following to summarize the distinct number of colors but I couldn't figure out how to do what I wanted( mentioned above).
df%>%
group_by(ID)%>%
summarize(n=n_distinct(color))%>%
ungroup()
Is there a way to do this? I would appreciate all the help there is! Thanks!