4

I have a simple dataframe that looks like the following:

Observation X1 X2 Group
1           2   4   1
2           6   3   2
3           8   4   2
4           1   3   3
5           2   8   4
6           7   5   5
7           2   4   5

How can I recode the group variable such that all non-recurrent observations are recoded as "unaffiliated"?

The desired output would be the following:

Observation X1 X2 Group
1           2   4   Unaffiliated
2           6   3   2
3           8   4   2
4           1   3   Unaffiliated
5           2   8   Unaffiliated
6           7   5   5
7           2   4   5

flâneur
  • 633
  • 2
  • 8

3 Answers3

5

We may use duplicated to create a logical vector for non-duplicates and assign the 'Group' to Unaffiliated for those non-duplicates

df1$Group[with(df1, !(duplicated(Group)|duplicated(Group, 
     fromLast = TRUE)))] <- "Unaffiliated"

-output

> df1
  Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

data

df1 <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
akrun
  • 874,273
  • 37
  • 540
  • 662
3

unfaffil takes a vector of Group numbers and returns "Unaffiliated" if it has one element and otherwise returns the input. We can then apply it by Group using ave. This does not overwrite the input. No packages are used but if you use dplyr then transform can be replaced with mutate.

unaffil <- function(x) if (length(x) == 1) "Unaffiliated" else x
transform(dat, Group = ave(Group, Group, FUN = unaffil))

giving

  Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

Note

dat <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

One way could be first grouping then checking for maximum of row number and finishing with an ifelse:

library(dplyr)

df %>% 
  group_by(Group) %>% 
  mutate(Group = ifelse(max(row_number()) == 1, "Unaffiliated", as.character(Group))) %>% 
  ungroup()
  Observation    X1    X2 Group       
        <int> <int> <int> <chr>       
1           1     2     4 Unaffiliated
2           2     6     3 2           
3           3     8     4 2           
4           4     1     3 Unaffiliated
5           5     2     8 Unaffiliated
6           6     7     5 5           
7           7     2     4 5    
TarJae
  • 72,363
  • 6
  • 19
  • 66