0

I want to aggregate informations (strings and numerics) by keeping the value from a particular record of my dataset.

Here is an example:

data <- tibble(id1 = c(1, 1, 2, 1, 2, 3), id2 = c('a', 'a', 'a', 'c', 'a', 'a'), id3 = c(1, 2, 3, 4, 5, 6), indicator = c(0, 1, 1, 1, 0, 1), info = c('red', 'blue', 'yellow', 'black', 'green', 'pink'))

# A tibble: 6 x 5
    id1 id2     id3 indicator info  
  <dbl> <chr> <dbl>     <dbl> <chr> 
1     1 a         1         0 red   
2     1 a         2         1 blue  
3     2 a         3         1 yellow
4     1 c         4         1 black 
5     2 a         5         0 green 
6     3 a         6         1 pink 

Here what I want to output:

tibble( id1 = c(1, 1, 2, 3), id2 = c('a', 'c', 'a', 'a'), info = c('blue', 'black', 'yellow', 'pink') )

# A tibble: 4 x 3
    id1 id2   info  
  <dbl> <chr> <chr> 
1     1 a     blue  
2     1 c     black 
3     2 a     yellow
4     3 a     pink  

I am not sure how I can do that using dplyr package

Thanks you,

John

Maël
  • 45,206
  • 3
  • 29
  • 67
John E.
  • 137
  • 2
  • 10

2 Answers2

1

Something like that?

data %>% 
  filter(indicator==1) %>% 
  arrange(id1) %>%
  select(-c(indicator,id3))

# A tibble: 4 x 3
    id1 id2   info  
  <dbl> <chr> <chr> 
1     1 a     blue  
2     1 c     black 
3     2 a     yellow
4     3 a     pink  
Maël
  • 45,206
  • 3
  • 29
  • 67
0

This seems like a very haphazard way of having the output. Is there any logic involved in the output which I may be missing ?

kashj
  • 136
  • 8