Create new column with distinct character values

Question

I have the following dataset:

ID <- c(1,1,1,1,1,1,1,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
           "yellow","red","red","blue","green")
df <- data.frame(ID,color)

I wish to have: the "color" column to only contain the distinct colors. So ID 1 has 7 observations with repeated colors but I want it to show just the distinct colors, so that ID 1 would have only 3 observations because it only has 3 distinct colors. etc

ID <- c(1,1,1,2,2,2,2)
n_color <- c(3,3,3,4,4,4,4)
color <- c("red","blue","green",
           "yellow","red","blue","green")
df <- data.frame(ID,n_color,color)

I know I can use the following to summarize the distinct number of colors but I couldn't figure out how to do what I wanted( mentioned above).

df%>%
 group_by(ID)%>%
 summarize(n=n_distinct(color))%>%
 ungroup()

Is there a way to do this? I would appreciate all the help there is! Thanks!

score 3 · Accepted Answer · answered Apr 11 '23 at 19:41

Using distinct

library(dplyr)

df %>% 
  group_by(ID) %>% 
  distinct(color, .keep_all = T) %>% 
  mutate(n_color = n(), .after = ID) %>% 
  ungroup()
# A tibble: 7 × 3
     ID n_color color 
  <dbl>   <int> <chr> 
1     1       3 red   
2     1       3 blue  
3     1       3 green 
4     2       4 yellow
5     2       4 red   
6     2       4 blue  
7     2       4 green

GKi · Answer 2 · 2023-04-12T06:59:09.120

You can use unique from base to get unique rows from df.

unique(df)
#   ID  color
#1   1    red
#4   1   blue
#5   1  green
#8   2 yellow
#9   2    red
#11  2   blue
#12  2  green

In case only specific columns should be used duplicated can be used and its result be inverted with ! to select the unique rows with [.

df[!duplicated(df),]
#   ID  color
#1   1    red
#4   1   blue
#5   1  green
#8   2 yellow
#9   2    red
#11  2   blue
#12  2  green

In case also the number of colors is needed ave could be used.

transform(unique(df), n_color = ave(ID, ID, FUN=length))
#   ID  color n_color
#1   1    red       3
#4   1   blue       3
#5   1  green       3
#8   2 yellow       4
#9   2    red       4
#11  2   blue       4
#12  2  green       4

Create new column with distinct character values

2 Answers2

Linked